The Audio Team at BBC R&D North Lab has recently published an orchestrated media demo (the Vostok-K incident) and is now available on BBC Taster (https://www.bbc.co.uk/taster/pilots/vostok). It shows how we can orchestrate media playback across multiple user devices to deliver a more immersive experience. The demo uses a cloud-based media synchronisation service that Dr. Rajiv Ramdhany (thank you for sharing the news, Rajiv!) built for the 2Immerse project. The demo is very similar to the one we shared a couple of years ago in a IEEE J-STSP journal article (open access). I believe a version of the underlying system (not sure if its the one used in the Taster demo) has borrowed our idea of a perceptual model to adjust playback for different “catch-up” scenarios (all complex equations can be found in the paper, if you are interested). Technically, it is quite difficult to achieve this level of synchronicity without using special chips and network protocols (read this paper to see how many things can go wrong). What’s also fantastic about this demo is that it uses content specifically created for the technology. Instead of using any off-the-shelf 5.1/7.1 movie sound tracks, the demo splits sound sources and merge them on-the-fly based on what user device(s) are available.
I think the biggest challenge is the level of human intervention still required for the demo to work and work well in the wild over mobile devices. Device discovery is an obvious topic and we can throw some crazy idea on it easily (e.g., ultrasonic piggy-back). I am also interested in how devices’ capabilities plays a key role in the experience (also observed from my previous experiments). While listening to the Vostok demo, I was subconsciously trying to work out my own location in the scene and that is often dictated by which device has a higher volume. So my personal experience might not be the same as how the directors wanted. Is this a bad thing? Not necessary. Like the user controlled 360 video, allowing the audience to choose where they want to be (e.g., decides which character you want to stand next to in a scene) can be a good pathway for content customisation and interactive media. Perhaps I should get my 3rd year undergrads to try out the demo in classroom. Wouldn’t it be cool to have 40 mobile phones going crazy at the same time? Maybe they’ll also come up with some nice projects to work on.