The Emperor’s New Headset:
What’s Wrong With VR

The first thing you need to know is that I am not a Virtual Reality (VR) enthusiast. “VR” is a catchall for three-dimensional imagery viewed in a (generally head-mounted) display. The imagery appears fixed in space and occupies the entire field of view. My dislike isn’t of VR per se, but stems rather from the messianic zeal with which big corporations, technologists, futurists, entrepreneurs, advertisers, venture capitalists, academics, journalists—pretty much everyone—have heralded its newest incarnation. This despite VR’s repeated failure since the sixties to find a mainstream audience.

I’d like to temper this revival of VR adventism with a rational assessment of the technology’s potential. And because I want it to be useful to VR enthusiasts, much of what follows is devoted to proposing practical strategies, developed while I worked on the first two versions of Google Cardboard, for using what we know about interactions with technology to identify and realize VR’s much more modest but actual promise. My hope is that demonstrating a gradual path down to the beach will keep at least some of the lemmings from hurtling off the cliff.

SIM: Situated Immersive Media

While I applaud the optimism of VR proponents who in their sweaty, clunky headsets see the genesis of the Star Trek holodeck, there’s currently no chance of mistaking a VR experience for a real one, so let’s dispatch with the sophistry of claiming that a “virtual” experience in a headset bears any meaningful relation to the “real” experience. I don’t want to call the subject of this discussion virtual anything, much less reality. I’m going to engage in the timeworn and annoying habit of precisely defining my terms so that we all know what’s under discussion.

Our subject is a new mode of experiencing media—primarily audiovisual but with the potential to include haptics—that has two primary interaction characteristics:

  1. It is immersive, i.e., it fills the user’s entire visual field, thereby engulfing him/her.
  2. It is situated, i.e., it is tied to the location, direction, and movement of the user relative to real space.

I prefer the term Situated Immersive Media (SIM) to describe the punch-to-the-gut, “wow this is neat!” effect that stereoscopic headset displays produce to the term “Virtual Reality”—both because it’s less prescriptive and because it carries less baggage (Lawnmower Man, anyone?). To succeed, SIM has to make explicit the uses and desirability of immersion and situation (or “presence” as it’s more commonly but problematically termed). The assumption that their utility is self evident—rather than any technical impediment—is what has tripped up the technology in the past.

Immersion is hardly unique to SIM. To be immersed is to be submerged, to lose awareness of everything outside of the immersive medium. The medium itself rarely matters as much as its content; books, movies, conversations, daydreams, music, and carpentry can all be immersive. SIM’s immersion is uniquely independent of content because it’s physically impossible to look away. Inescapability presents novel interaction possibilities, but brute-force immersion is not the same as total absorption in an activity, nor is it good or desirable on its own. I’ve never heard anyone complain about the “lack of immersion” in a piece of media or talk about an experience not being “immersive enough.” It’s certainly possible, as I’ve heard SIM boosters argue, that people won’t realize they miss immersion till they’ve tried SIM, but good content is immersive regardless of medium, and certainly achievable with less hassle than with SIM.

Unlike immersion, the spatially accurate three-dimensionality SIM proponents term “presence” is in fact unique to SIM, and does aptly describe feeling situated inside a simulated space—there is an undeniable there there. I prefer “situation” to “presence” because its connotations are spatial rather than experiential. Yoga instructors don’t tell you to be “situated” in the moment. By any name, feeling contained within a SIM is so astonishing that first SIM experiences often get described in religious terms as “conversions” or “epiphanies” and characterized as “transcendent” or “otherworldly.” As one tech executive recently told me, “I know we haven’t found VR’s killer app, but it’s so compelling I just can’t believe there simply isn’t one.”

And there almost certainly is, but not on the tectonic PC/Internet/Social Media/iPhone scale he’s imagining. That’s because immersion and situation necessarily circumscribe the type of messages SIM can convey better than other media. What follows is a series of guidelines that embrace these constraints to help cut through hype and focus on uses that will drive SIM’s adoption. They frame SIM in terms of the interactions situation and immersion uniquely enable, rather than using them as a mantra to foist headsets onto a public mystified by their purpose. They highlight assumptions and present alternatives which might not be correct or practical, but are intended to spark different ideas.

1.  Target the limbic system

Because it fools our visual system so convincingly, SIM feels like it’s tapping directly into the precognitive brain, circumventing symbolic processing entirely. Situation and immersion work mostly at the subconscious level, and produce a sensory immediacy equivalent to the hammer-to-the-knee reflex. For instance, SIM viscerally conveys the largeness or smallness of objects in relation to you, as well as the space between them. Scale feels unmediated in SIM, as opposed to on a screen. Imagine watching a video shot inside a cathedral or an airplane hangar. On a TV, you understand they’re huge conceptually, but in a SIM, you feel their vastness as if you were standing inside them. The same is true of the body language of animated characters in SIM—both filmed and computer generated—you perceive even the subtlest cues instantly and subconsciously. It’s as if SIM represents a new, separate channel communicating information directly to your lizard brain. The operative question is how to best use such a channel.

A horror movie can frighten you and a sad story can elicit a sob, but those are proxy emotions felt on behalf of their protagonists. SIM can evoke different kinds of fear—of heights, of being chased, of embarrassment—by putting you directly in situations that provoke them. The palette of emotions at SIM’s disposal are much broader and nuanced because they’re directly yours. In this way, SIM resembles a dream more than a movie. You notice this acutely when you take off a SIM headset and your brain does a little jog as it resets its sensory context to the real world, much as it does when you wake up.

2.  Don’t be so literal

Immersion and situation are SIM’s distinguishing characteristics, but to be useful, they have to be marshalled in the service of meaning. It’s not enough to drop a user in a giant space and tell her to look around without attaching the experience of feeling small to a purpose. Spatial cues in SIM can make a user feel like he’s being followed, like there’s nowhere he can hide, like he’s being tenderly watched over, like he’s totally alone, or like everyone’s eyes are on him. These are complicated, irrational emotions that a SIM creator can use to enhance a story, set the tone for a meeting, or do something entirely new. Placing the user in the center of the Sainte Chapelle just because makes a nice tech demo, but it’s equivalent to an amusement park ride—you try it once so you’ve tried it, then you move on. Saying “it’s more immersive” or “you feel like you’re there” doesn’t explain why SIM is worth the hassle any more than saying “it’s red and round” explains why tomatoes are good to eat.

“Aha,” you say, “but feeling like you’re there is useful!” This conflation of spatial understanding with actual experience is precisely why I prefer “situation” to “presence.” The argument usually runs thus: if you can’t afford/are too sick/it’s too dangerous to travel somewhere, then SIM is the next best thing. But next best to what and for what? A SIM of a Syrian refugee camp captures the audiovisual environment in higher fidelity than a video, but to what end? The user experiences the camp’s scale and spatial context, but he has no more “been there” or felt what it’s like to be a refugee than he has after watching a documentary on the Discovery Channel. Writing recently in The Atlantic, Yale psychologist Paul Bloom explains:

The problem is that these experiences aren’t fundamentally about the immediate physical environments. The awfulness of the refugee experience isn’t about the sights and sounds of a refugee camp; it has more to do with the fear and anxiety of having to escape your country and relocate yourself in a strange land.

To feel what a refugee feels, he argues, the next best thing to actually being a refugee in a camp is consulting a first-person written or spoken account. What I’m getting to here is that a metaphorical, non-literal use of SIM might enable the communication of comparable fear and anxiety. The experience would more closely resemble a poem or a piece of music than a documentary film.

3.  Edit everything down to something

A large part of not being literal is judicious editing. Simulation, like reality, is complicated and taxing, which makes it useful for training pilots and surgeons (tasks incidentally for which SIM has been used for decades) but quickly leads to sensory overload in other contexts. No one would argue that a Google search should mimic rifling through millions of boxes of papers in an infinite closet. You enter a query, and with no further looking, the answer pops up. It takes advantage of the unique capabilities of computers to spare you the tedium of real-world searching.

Similarly, no one really wants to shuffle through virtual drawers for a virtual pencil with which to scribble a virtual note using a virtual hand. Our imaginations are more powerful and more immersive than even the most perfect holodeck; to assume the goal is a perfect simulacrum is to misunderstand SIM’s potential.

Most people also don’t want to pore through 360 degrees of streaming information to find a story. Looking around to find the interesting bits and patch them together is tiring and a lot of work, just ask a writer or a journalist. The problem with interactive fiction has always been the difficulty of dynamically constructing compelling stories out of smaller, reorderable units. The most accomplished form of immersive, interactive narrative I’ve experienced to date is Sleep No More. But it’s an experience more than a story, and when I retell it, I talk about how I felt and what I did much more than about what the various characters I was watching happened to be doing. The story ends up being about me, and it doesn’t scratch the same itch as seeing a traditional play, watching a movie, or reading a book. And that’s SIM competition—not other SIM experiences, but all other media.

4.  Design in conjunction with other media and devices

Microsoft made a terrible mistake when it positioned the Kinect’s skeleton tracking as a replacement for the game controller—”you are the controller”—instead of as an additional input to the controller (imagine playing a racing game where in addition to controlling the car with the controller, you can use the angle of your head and shoulders to help steer the car around turns or bump other players with you on the couch to knock them onscreen).

They replaced the high-dexterity manipulation capabilities of the hands with a much slower and clunkier form of body semaphore. It’s the difference between using your hands to solve a Rubik’s cube and using your hands to direct someone else to solve a Rubik’s cube from across the room.

Immersive media faces a similar challenge. If any SIM device is to succeed, it must to do so in the context of all the other existing media. Much of what’s preventing SIM’s mainstream success is the lack of a story about how it interacts with or enhances other platforms and devices—TVs, phones, laptops, printed matter, architecture, performance, the web, gaming consoles—and what unique value it adds to users’ interactions with them.

Imagine watching a movie on a screen in an otherwise empty SIM. A character emerges from Penn Station and gets her first ever glimpse of New York. When she does, the screen disappears and New York is all around the viewer. The effect of sudden scale and situation would be breathtaking, much like that character’s first glimpse of midtown. Or if rats suddenly streamed toward a viewer from the feet of two politicians debating on a stage. Or if a regular movie had a SIM portion in which the main character experienced tunnel vision. The expressive possibilities are mind boggling and largely untapped.

5.  Design for the technology you have

Since its earliest days, SIM has been waiting for technology to catch up with its aspirations. Maybe because of that underlying narrative, there’s a sense that all current SIM tech is just a rough draft of magical, invisible headsets of the future. That’s simply not possible given how the technology works. By definition, SIM requires covering a user’s eyes to obscure the outside world. That’s true whether it’s with the crane-hoisted two-ton helmets of yesteryear or the dainty little eye masks of tomorrow.

Telepresence, for instance, is one of the most commonly cited use cases for SIM. Even though communication steadily has been moving away from immersiveness—the majority of interpersonal communication now happens in asynchronous bursts of images, short videos, and text—there is much excited discussion of SIM’s social possibilities. That smacks of wishful thinking—the digital tools that have shaped our communication over the last twenty years definitely privilege immediacy over verisimilitude—and of a deeper kind of self-delusion. Covering your eyes to “see” someone virtually has a distinct “we had to destroy the village in order to save it” ring to it.

A spatialized 3D video conference in which remote participants are indistinguishable from physically present ones would be awesome, but it’s categorically impossible while all eyes are covered! But what if those eyes weren’t covered all the time?

6.  Take no interaction for granted

Different SIM experiences will require different types of input. By building appropriate inputs for each, SIM creators can eventually arrive at the uses and type(s) of abstractions that work best for the system as a whole. There’s no need to generalize at the outset, to recreate existing interface elements (lists, buttons, sliders, keyboards), or model SIM interactions on existing ones, especially before anyone has developed a clear idea of what SIM is for. Head rotations, hand motions, and manipulations of virtual objects don’t need to behave as they do outside the SIM, provided the interactions are consistent and intuitive enough to be understood quickly.

This holds doubly true for SIM hardware. Covering one’s eyes in public is a deeply antisocial (and potentially dangerous) act, and even in semi-private it requires significant trust in one’s companions. That limits the situations in which people will be willing to strap into a SIM headset for extended periods. But who says the experiences have to be extended? And does the hardware really require a headstrap? What if it were used more like a microscope or a pair of binoculars? Imagine for instance a SIM headset that behaved more like an e-cigarette. If you find yourself getting irritated in a meeting, why not sneak away to your happy place for a second?

7.  Finally, be wary of gaming

As the old standbys of simulation, virtual travel, and telepresence fail to sell headsets, the focus will most certainly land squarely on gaming. Gamers are willing to accept handicaps and suffer all sorts of indignities that they won’t in a non-gaming context. They’ll wear helmets, wield controllers shaped like barbells, invest in special treadmills, and engage in behaviors that in a wider social context would be silly.

It’s dangerous to pin the future of a multi-billion dollar technology on gaming alone. Gamers are a fickle bunch, and gaming technology doesn’t translate well to other uses because the interaction design for a game is often one of the obstacles the player must overcome—performing impossible button combinations, reacting faster than is consciously possible, memorizing enemy movements and positions, decoding arcane clues and requirements. Easy games are boring games. The same cannot be said for most other systems people use frequently.

Old problems require new solutions

Despite a predominant narrative that says otherwise, SIM’s failure to find a mass audience over the last fifty years has very little to do with technology. More to blame than cost, latency, or tracking degrees of freedom is the uninspired, literal application of immersion and situation to tasks for which they’re ill suited. It’s like noticing videos of surgery make people lose their appetites and concluding that watching appendectomies will be the next diet craze.

Finally, it’s important to point out that not every medium is a mass medium. Do I believe there are ideas that can only be fully expressed in SIM, creative impulses for which SIM is the appropriate metier? Sure, why not? Do I think that necessarily implies that SIM is a future mass medium deserving of the insane amounts of money and talent that have been thrown at it? Of course not. SIM is poetry to 2D media’s prose: it has a small, enthusiastic following, but its production and consumption require more patience, technical competence, and sustained engagement than most people are capable of. And it’s extremely unlikely to make anyone rich.

The real world is tedious; otherwise, people wouldn’t be so excited to escape it. To replicate that reality virtually squanders SIM’s potential. To misquote McLuhan: given SIM as a medium, what’s a suitable message? I don’t have a clear idea, but I have a hunch SIM might represent the beginning of a new kind of entertainment; entertainment in which the user is neither central protagonist (as in a game or first-person narrative) nor impassive observer (as in a movie or omniscient narration), but rather a kind of willing artistic subject whose actions and feelings are indirectly manipulated by an author. A second-person medium. Or something else entirely, a form that takes full advantage of SIM’s limbic and spatial expressiveness. The only way to find out, though, is to abandon long preconceived notions about what SIM is and how it should be used, and to focus instead on the interactions it uniquely enables.