By now, most of us are well aware of the market buzz around the topics of virtual and augmented reality. Many of us, at some point or another, have donned the bulky, head-mounted gear and tepidly stepped into the experience to check it out for ourselves. And, depending on how sophisticated your set up is (and how much it costs), your mileage will vary. Ironically, some research suggests that it’s the baby boomers who are more likely to be “blown away” with virtual reality rather than the millennials who are more likely to respond with an ambivalent “meh”. And, this brings us to the ultimate question that is simmering on the minds of a whole lot of people: is virtual reality here to stay?
It’s a great question.
Certainly, the various incarnations of 3D viewing in the last half-century, suggest that we are not happy with something. Our current viewing conditions are not good enough, or … something isn’t quite right with the way we consume video today.
What do you want to see?
Let’s face it, the way that we consume video today is not the way our eyes were built to record visual information, especially in the “real-world”. Looking into the real world (which, by the way, is not what you are doing right now) your eyes capture much more information than the color and intensity of light reflected off of the objects in the scene. In fact, the Human Visual System (HVS) is designed to pick up on many visual cues, and these cues are extremely difficult to replicate both in current generation display technology, and content.
Displays and content? Yes. Alas, it is a two-part problem. But let’s first get back to the issue of visual cues.
What your brain expects you to see
Consider this, for those of us with the gift of sight, the HVS provides roughly 90% of the information we absorb every day, and as a result, our brains are well-tuned to the various laws of physics and the corresponding patterns of light. Put more simply, we recognize when something just doesn’t look like it should, or when there is a mismatch between what we see and what we feel or do. These mismatches in sensory signals are where our visual cues come into play.
Here are some cues that are most important:
- Vergence distance is the distance that the brain perceives when the muscles of the eyes move to focus at a physical location, or focal plane. When that focal plane is at a fixed distance from our eyes, let’s say, like with the screen in your VR headset, then the brain is literally not expecting for you to detect large changes in distance. After all, your eye muscles are fixed at looking at something that is physically attached to your face, i.e. the screen. But, when the visual content is produced in a way so as to simulate the illusion of depth (especially large changes in depth) the brain recognizes that there is a mismatch between the distance information that it is getting from our eyes vs. the distance it is trained to receive in the real world based on where our eyes are physically focused. The result? Motion sickness and/or a slew of other unpleasantries.
- Motion parallax: As you, the viewer, physically move, let’s say walk through a room in a museum, then objects that are physically closer to you should move more quickly across your field of view (FOV) vs. objects that are farther away. Likewise, objects that are positioned farther away should move more slowly across your FOV.
- Horizontal and vertical parallax: Objects in the FOV should appear differently when viewed from different angles, both from changes in visual angles based on your horizontal and vertical location.
- Motion to photon latency:. It is really unpleasant when you are wearing a VR headset and the visual content doesn’t change right away to accommodate the movements of your head. This lag is called “motion to photon” latency. To achieve a realistic experience, motion to photon latency must be less than 20ms, and that means that service providers, e.g. cable operators, will need to design networks that can deterministically support extremely low latency. After all, from the time that you move your head, a lot of things need to happen, including signaling head motion, identifying the content consistent with the motion, fetching that content if not already available to the headset, and so on.
- Support for occlusions, including the filling of “holes”. As you move through, or across, a visual scene, objects that are in front of or behind other objects should block each other, or begin to reappear consistent with your movements.
It’s no wonder…
Given all of these huge demands placed on the technology by our brains, it’s no wonder that current VR is not quite there yet. But, what will it take to get there? How far does the technology still have to go? Will there ever be a real holodeck? If “yes”, when? Will it be something that we experience in our lifetimes?
The holodeck first appeared properly in Star Trek: The Next generation in 1987. The holodeck was a virtual reality environment which used holographic projections to make it possible to interact physically with the virtual world.
Fortunately, there are a lot of positive signs to indicate that we might just get to see a holodeck sometime soon. Of course, that is not a promise, but let’s say that there is evidence that content production, distribution, and display are making significant strides. How you say?
Capturing and displaying light fields
Light fields are 3D volumes of light as opposed to the ordinary 2D planes of light that are commonly distributed from legacy cameras to legacy displays. When the HVS captures light in the natural world (i.e. not from a 2D display), it does so by capturing light from a 3D space, i.e. a volume of light being reflected from the objects in our field of view. That volume of light contains the necessary information to trigger the all-too-important visual cues for our brains, i.e. allowing us to experience the visual information in a way that is natural to our brains.
So, in a nutshell, not only does there need to be a way to capture that volume of light, but there also needs to be a way to distribute that volume of light over a, e.g. cable, network, and there needs to be a display at the end of the network that is capable of reproducing the volume of light from the digital signal that was sent over the network. A piece of cake, right?
Believe it or not
There is evidence of significant progress on all fronts. For example, at the F8 conference earlier this year, Facebook, unveiled its light field cameras, and corresponding workflow. Lytro is also a key player in the light field ecosystem with their production-based light field cameras.
For the display side, there is Light Field Lab and Ostendo, both with the mission to make in-home viewing with light field displays, i.e. displays that are capable of projecting a volume of light, a reality.
On the distribution front, both MPEG and JPEG have projects underway to make the compression and distribution of light field content possible. And, by the way, what is the digital format for that content? Check out this news from MPEG’s 119th meeting in Torino:
At its 119th meeting, MPEG issued Draft Requirements to develop a standard to define a scene representation media container suitable for interchange of content for authoring and rendering rich immersive experiences. Called Hybrid Natural/Synthetic Scene (HNSS) data container, the objective of the standard will be to define a scene graph data representation and the associated container for media that can be rendered to deliver photorealistic hybrid scenes, including scenes that obey the natural flows of light, energy propagation and physical kinematic operations. The container will support various types of media that can be rendered together, including volumetric media that is computer generated or captured from the real world.
This latest work is motivated by contributions submitted to MPEG by CableLabs, OTOY, and Light Field Labs.
Hmmmm … reading the proverbial tea-leaves, maybe we are not so far away from that holodeck experience after all.
—
Subscribe to our blog to read more about virtual reality and more CableLabs innovations.