I attended the Metaverse Roadmap workshop last week at Stanford, and I was pleased to hear Mitch Kapor touting the benefits of gestural input for virtual worlds (see Dan's blog post). I was especially excited when he said you can (almost) buy "3D webcams" - cameras plus an infrared depth sensor - for only $39! (Not sure if he was talking about the ZCam specifically). A couple of years ago, when Richard Marks of the Sony EyeToy team visited PARC and demonstrated their "real-time motion capture" for games, he said 3D cameras still cost $20,000!
Now I don't know if gestural interfaces will revolutionize computing in general, but I'm very excited about the new possibilities they create for 3D avatar control. As I've written about before, avatars will never be fully expressive until they are enabled with free gesticulation. With 3D cameras and real-time motion-capture techniques, "Players could use their own bodies and faces as joysticks in puppeteering their avatars." Currently in MMOs, gesture and facial expression are limited to a pre-specified library of commands (/bow, /wave, /point, /smile, /wink, etc.). Imagine if text chat were like this? What if you could only send chat messages by selecting them from a pre-specified library of phrases (like chat between strangers in ToonTown)? This would be severely limiting in terms of communication and expression. However, that's the current state of avatar gesture in virtual worlds.
In practice, free chat or voice can help compensate for this lack of free gesticulation. Take cybersex, for example, with its elaborate text descriptions of what you wish your clunky avatar could do: <Alex gently grips the nape of your neck with one hand and leans in for a kiss>. This really highlights a big gap in current avatar functionality!
Now in modeling gesture, we must recognize that not all "gestures" are the same. The canned gestures in today's virtual worlds and MMOs work okay for certain types of gestures, but others will require free gesticulation.
1. Stand-alone gestures (or emblems) - bows, waves, shrugs, head-shakes, nods, etc. - work pretty well in current avatar systems. Stand-alone gestures are fairly standardized embodied symbols that convey meaning independently of the surrounding talk. By convention, a nod conveys "yes" and a shrug conveys "I don't know." The gesture alone can stand as an intelligible turn.
2. Pointing gestures (or deictics) are also possible in current systems, but are a little harder to perform because they rely partially on the talk for their meaning. They should be precisely timed with relevant key words (e.g., "the cave is over there" where the extent of the point is simultaneous with "there"). This is tricky when you must type both the gesture command and the chat message. It's much easier with voice.
3. Emphatic gestures (or beats) are not really possible with today's avatars. Beat gestures emphasize particular words by being produced simultaneously with them. Because gesture commands and chat messages must be typed separately, beats cannot currently be tied to particular words. However, the appearance of beat gestures can certainly be faked, as in World of Warcraft's "talking" animation.
4. Finally, depictive gestures (or iconics) are not possible in today's virtual worlds and MMOs. Iconic gestures depict objects by mirroring elements of their physical form and/or motion. Their production is creative, and their uniqueness reflects the diversity of all the objects in the world; therefore, it's impossible to create a comprehensive library of them (and even if you tried, it would be too massive for players to handle). But with real-time mo-cap, depictive gesturing becomes possible. So for example, in World of Warcraft, if I can't think of the term "night elf," I can nonetheless depict the race through gesture: I run my thumbs and index fingers, in a pinching shape, along the edges of my imaginary long night-elf ears as they taper to a point. Or you can imagine in Second Life that one dominatrix might say to another, "Where can I buy one of those tops that are like a corset but only go from here to here?" while she places one "karate-chop" hand just under her avatar's breasts and the other hand just below its belly button. Her fellow dominatrix can then say, "Oh, you mean a 'cincher.'"*
While free gesticulation will no doubt revolutionize avatar control, just how such gestural interfaces should be designed and what activities they will be good for is still largely an open question. One fact that must be accommodated is that players at the keyboard will not only use their bodies to animate their avatars, but may also use them in the physical world at the same time. If my son approaches me at the keyboard, I don't want my avatar's head to turn when I look at him, especially if I'm talking to other people in the virtual world. And what about running? Do I really want to run my character across Norrath by running in place in front of my PC? (Actually, that might be more fun than the gym.) At the very least, free gesticulation in virtual worlds should make cybersex a lot less textual!
* For more on the reparative uses of depictive gesture in real life, see Moore, Robert J. (2008): "When Names Fail: Referential Practice in Face-to-face Service Encounters." Language in Society, 37(3). (Coming soon!)