I attended the Metaverse Roadmap workshop last week at Stanford, and I was pleased to hear Mitch Kapor touting the benefits of gestural input for virtual worlds (see Dan's blog post). I was especially excited when he said you can (almost) buy "3D webcams" - cameras plus an infrared depth sensor - for only $39! (Not sure if he was talking about the ZCam specifically). A couple of years ago, when Richard Marks of the Sony EyeToy team visited PARC and demonstrated their "real-time motion capture" for games, he said 3D cameras still cost $20,000!
Now I don't know if gestural interfaces will revolutionize computing in general, but I'm very excited about the new possibilities they create for 3D avatar control. As I've written about before, avatars will never be fully expressive until they are enabled with free gesticulation. With 3D cameras and real-time motion-capture techniques, "Players could use their own bodies and faces as joysticks in puppeteering their avatars." Currently in MMOs, gesture and facial expression are limited to a pre-specified library of commands (/bow, /wave, /point, /smile, /wink, etc.). Imagine if text chat were like this? What if you could only send chat messages by selecting them from a pre-specified library of phrases (like chat between strangers in ToonTown)? This would be severely limiting in terms of communication and expression. However, that's the current state of avatar gesture in virtual worlds.
In practice, free chat or voice can help compensate for this lack of free gesticulation. Take cybersex, for example, with its elaborate text descriptions of what you wish your clunky avatar could do: <Alex gently grips the nape of your neck with one hand and leans in for a kiss>. This really highlights a big gap in current avatar functionality!
Now in modeling gesture, we must recognize that not all "gestures" are the same. The canned gestures in today's virtual worlds and MMOs work okay for certain types of gestures, but others will require free gesticulation.
1. Stand-alone gestures (or emblems) - bows, waves, shrugs, head-shakes, nods, etc. - work pretty well in current avatar systems. Stand-alone gestures are fairly standardized embodied symbols that convey meaning independently of the surrounding talk. By convention, a nod conveys "yes" and a shrug conveys "I don't know." The gesture alone can stand as an intelligible turn.
2. Pointing gestures (or deictics) are also possible in current systems, but are a little harder to perform because they rely partially on the talk for their meaning. They should be precisely timed with relevant key words (e.g., "the cave is over there" where the extent of the point is simultaneous with "there"). This is tricky when you must type both the gesture command and the chat message. It's much easier with voice.
3. Emphatic gestures (or beats) are not really possible with today's avatars. Beat gestures emphasize particular words by being produced simultaneously with them. Because gesture commands and chat messages must be typed separately, beats cannot currently be tied to particular words. However, the appearance of beat gestures can certainly be faked, as in World of Warcraft's "talking" animation.
4. Finally, depictive gestures (or iconics) are not possible in today's virtual worlds and MMOs. Iconic gestures depict objects by mirroring elements of their physical form and/or motion. Their production is creative, and their uniqueness reflects the diversity of all the objects in the world; therefore, it's impossible to create a comprehensive library of them (and even if you tried, it would be too massive for players to handle). But with real-time mo-cap, depictive gesturing becomes possible. So for example, in World of Warcraft, if I can't think of the term "night elf," I can nonetheless depict the race through gesture: I run my thumbs and index fingers, in a pinching shape, along the edges of my imaginary long night-elf ears as they taper to a point. Or you can imagine in Second Life that one dominatrix might say to another, "Where can I buy one of those tops that are like a corset but only go from here to here?" while she places one "karate-chop" hand just under her avatar's breasts and the other hand just below its belly button. Her fellow dominatrix can then say, "Oh, you mean a 'cincher.'"*
While free gesticulation will no doubt revolutionize avatar control, just how such gestural interfaces should be designed and what activities they will be good for is still largely an open question. One fact that must be accommodated is that players at the keyboard will not only use their bodies to animate their avatars, but may also use them in the physical world at the same time. If my son approaches me at the keyboard, I don't want my avatar's head to turn when I look at him, especially if I'm talking to other people in the virtual world. And what about running? Do I really want to run my character across Norrath by running in place in front of my PC? (Actually, that might be more fun than the gym.) At the very least, free gesticulation in virtual worlds should make cybersex a lot less textual!
* For more on the reparative uses of depictive gesture in real life, see Moore, Robert J. (2008): "When Names Fail: Referential Practice in Face-to-face Service Encounters." Language in Society, 37(3). (Coming soon!)
Comments on Is 'Free Gesticulation' For Avatars Here Yet?:
W.r.t "beat" gestures, you say:
"Because gesture commands and chat messages must be typed separately, beats cannot currently be tied to particular words."
I am wondering if, for this one problem, there could be a markup language for chat that would allow us to accomplish this, or is that a terribly clunky solution?
For instance, for a "beat" I could type "I'm *seriously* angry" and perhaps have the words surrounded with * tied to a particular gesture...? Or, "I'm [gesture=pumpfists]seriously[/gesture] angry."
It's not exactly what we want, but I'm wondering if it's a lot more doable.
The problem there is really in people's reading speed versus what my avatar is doing. You mention this with the pointing gestures. I think the problems are the same.
This is all very interesting.
Posted Feb 19, 2008 4:43:28 PM | link
I just saw this today on the BBC website:
Who needs a gesture interface when you can just "think" the commands in a neural interface?
Though Epoc is obvoiusly still in it's infancy, teh possibilities are obvoiusly huge, and I think it's a highly positive sign that IBM are involved in this.
Posted Feb 20, 2008 3:40:12 AM | link
Gestural input would be VERY cool. The problem is not the cameras, but bandwidth. Passing all those quaternions back and forth all the time would use an enormous amount of bandwidth. Signaling to a group of clients that avatar X shall now execute animation code Y is a far cry from passing a bunch of keyframes.
Posted Feb 20, 2008 3:41:16 AM | link
Wow, I need to watch my typing before my first coffee in the morning!
Posted Feb 20, 2008 3:42:31 AM | link
Why stop at gestures? If you're going to track users' hand movements and reproduce them in an avatar's gestures, why not give them more than cosmetic functionality? If there's something on the floor that I want to pick up, well leaning forward and picking it up in real life could do it on the screen. If I want to hit someone with my sword, why am I clicking buttons when I could just be flailing my sword around in a broad sweep?
If capturing gestures is good, then surely capturing non-gestures that still have meaning is also good?
Posted Feb 20, 2008 3:52:21 AM | link
I think the MPs will really run amok once they discover that you can play Manhunt 4 with gesture controls, Richard. *enjoys the thought* Now we'll just need a couple of "force feedback"-overalls simulating the swing of the sword and you'll no longer need to pay the club fees in order to perfect your head-chopping skills.
I've been actually waiting quite a while for something like this to appear and I'm very glad that we're finally starting on getting there. I think once they adapt the MMORPG-long running into real life the nerds will definitely be the sub-culture with the highest life-expectancy on this planet.
Posted Feb 20, 2008 7:41:19 AM | link
i thought of the same thing, megan. i like it. a simple markup that would trigger animations around particular words. i think your first example (using * ) is the best one. in forums you can get away with special markup commands, but in real time interaction it's too much distraction and overhead to type "[pumpfists]hey you![/pumpfists]" even for an html hand coder like myself.
single characters, like * and _ which are often used in as emphasis in text-only situations anyway would work best for their simplicity.
Posted Feb 20, 2008 11:59:00 AM | link
Richard>If you're going to track users' hand movements and reproduce them in an avatar's gestures, why not give them more than cosmetic functionality?
My point here is that free gesticulation will actually be INSTRUMENTAL for communicating, not merely "cosmetic." Currently some gesture commands already ARE instrumental. I regularly use a subset of them in place of chat - /nod for "yes," /shrug for "I don't know," sometimes /smile for "thank you." But I agree that many gesture commands, or "socials," seem to be designed largely for humorous effect. Also, all gesture commands are pretty clunky to use, and I think that's why most players don't use them much (except RPers). Free gesticulation should be easier to use, more effective for communicating and also more expressive.
But sure, it will be very exciting to use gestural interfaces for game play, travel and other activities in addition to communication. (Communication just happens to be my main research interest.)
megan>For instance, for a "beat" I could type "I'm *seriously* angry" and perhaps have the words surrounded with * tied to a particular gesture...?
Sure. Or I'm thinking words in CAPS could simply be tied to a single "beat gesture" animation, like thrusting your hands outward. Many players already use CAPS to mark emphasis (as well as asterisks). But the problem with this is that in almost all chat systems, the whole turn appears at once, so you can't see which particular word is being emphasized. However, in chat systems that post messages A WORD AT A TIME (like There or its derivatives, VMTV, IMVU or Forterra), the beat gesture could be triggered precisely when the key word appears publicly.
However, I think players will emphasize FEWER words with this kind of textual marking than they would if they were using free gesticulation. With the latter, I think they will use many beat gestures without even being aware of it.
Posted Feb 20, 2008 12:51:30 PM | link
Bob Moore>My point here is that free gesticulation will actually be INSTRUMENTAL for communicating, not merely "cosmetic."
Well they certainly were in textual worlds, so if you can get some of that back for graphical worlds, great!
>Currently some gesture commands already ARE instrumental. I regularly use a subset of them in place of chat - /nod for "yes," /shrug for "I don't know," sometimes /smile for "thank you."
These are fine if the gestures don't carry some predefined tag text. If I /nod, I don't want there to be text (or graphics) that suggests I'm nodding "enthusiastically" or "in agreement"; I just typed /nod. If I'd typed /nod enthusiastically, OK, fair enough, but I don't want an enthusiastic nod when I'm trying to show thoughtfulness, say.
Posted Feb 21, 2008 10:01:41 AM | link
Bob, as much as I like Second Life and avatars, I'm still waiting for someone to explain to me, if I have a 3-D webcam that broadcasts my real-life self and scenes and other real-life people and scenes in high fidelity that I'd want to use that to...run the synthetic representations of selves and scenes in virtual worlds.
That is, I'm quite happy to make the case for virtual worlds and everything they contain, in their own terms. But if video advances to such an extent that it becomes cheap and easy to use, and is really high-fidelity *and* able to be easily manipulated and edited by the average person, I do wonder if the need for avatars will fall away.
Posted Feb 21, 2008 10:54:29 AM | link
Very interesting article, thanks!
Re your statement: "Stand-alone gestures are fairly standardized embodied symbols that convey meaning independently of the surrounding talk. By convention, a nod conveys "yes" and a shrug conveys "I don't know.""
- I was wondering if there has been any evidence (studies or anecdotal) how cultural differences in non-verbal communcation in general and emblems in particular play out in virtual worlds? E.g. and afaik, in some cultures shaking your head means "yes", in others "no"; there are probably lots of other examples that might become the more relevant the more virtual worlds include non-verbal means of expression/communication.
Posted Feb 21, 2008 11:28:57 AM | link
Richard Bartle: I don't want an enthusiastic nod when I'm trying to show thoughtfulness, say.
Hmm.. I found those unintended aspects to be entertaining. Quite often though, the way a player's emotes are phrased (usually as macros) is essentially an aspect of the avatar (what you look like and what kind of player sits behind the screen) rather than an aspect of the character's communication...
Posted Feb 21, 2008 6:31:03 PM | link
I am of the belief that granular control over avatars is getting more attention than is required, at least in the early stages of this field. I've read articles that outline studies related to facial expressions, mouth movements and camera focusing technologies (as well as gesturing).
Rather than simply duplicating those cues we deem as important communication tools in real life (RL), let's instead fully leverage these 3D spaces first. By doing so we can enhance productivity, communication and socialization in ways never before possible. If effective virtual world communication still calls for RL components, then so be it. However, my bet is that more powerful methods of human interaction are out there waiting to be discovered.
Posted Feb 21, 2008 10:22:57 PM | link
"The problem is not the cameras, but bandwidth. Passing all those quaternions back and forth all the time would use an enormous amount of bandwidth."
It really is a problem. Second Life has the capability to allow you to upload animations, so people have done this, and incorporated these into "gestures" (so for example when I type "argh", my avatar grabs its head and pounds it into the ground) and into "Animation Overriders" (AOs), which replace/override the default walking, sitting, standing animations with ones that people prefer.
But these all take bandwidth. So when you go to a public event with 50 people there, and maybe 35 have AOs attached, you're lagging out even more, and the server is grinding trying to send out the howevermany kilobytes it is for that new arrival's walking animation to be distributed to 49 clients, and again, every time they trigger another animation change.
This is with just what is effectively a library of gestures, not a real time process. All that said, I know Linden Lab have mentioned "avatar puppetry" and I think demonstrated it as a potential new feature. Maybe the bandwidth problem is the hurdle though. It's still something that may be ok for small groups?
Posted Feb 22, 2008 4:26:17 AM | link
I'd personally just stick with facial gestures. If my avatar and others could reflect my face (smile, eyes (blink, looking around, etc.), eye brows, forehead, etc.) it would communicate a lot of nuances of communications.
Posted Feb 28, 2008 12:01:01 PM | link
I'd like to see what one could do with a multi-touch gestural system for controlling avatar action.
Posted May 8, 2008 1:12:01 PM | link
Posted May 27, 2008 2:00:56 PM | link
Posted May 27, 2008 2:02:07 PM | link
Posted May 27, 2008 2:03:29 PM | link
Posted May 27, 2008 8:27:22 PM | link
Posted May 27, 2008 8:29:36 PM | link
Posted May 27, 2008 8:30:18 PM | link
hello my name is coin
Posted May 30, 2008 9:09:15 PM | link
heh, nice web-site!
Posted Jul 19, 2008 10:41:37 AM | link
Wow. I've been able to find a lot of free avatars, but ones with actual gesticulation capabilities would be incredible....
Posted Aug 13, 2008 3:35:19 PM | link
Alex from VR-WEAR is working on a mod to SL allowing gesticulation using standard webcam.
Example here: http://www.mobitrends.com/2008/09/05/vr-wear-sl-viewer-mod-public-launch/
Posted Sep 17, 2008 4:26:54 AM | link