Feb 19, 2008



W.r.t "beat" gestures, you say:

"Because gesture commands and chat messages must be typed separately, beats cannot currently be tied to particular words."

I am wondering if, for this one problem, there could be a markup language for chat that would allow us to accomplish this, or is that a terribly clunky solution?

For instance, for a "beat" I could type "I'm *seriously* angry" and perhaps have the words surrounded with * tied to a particular gesture...? Or, "I'm [gesture=pumpfists]seriously[/gesture] angry."

It's not exactly what we want, but I'm wondering if it's a lot more doable.

The problem there is really in people's reading speed versus what my avatar is doing. You mention this with the pointing gestures. I think the problems are the same.

This is all very interesting.


I just saw this today on the BBC website:


Who needs a gesture interface when you can just "think" the commands in a neural interface?

Though Epoc is obvoiusly still in it's infancy, teh possibilities are obvoiusly huge, and I think it's a highly positive sign that IBM are involved in this.



Gestural input would be VERY cool. The problem is not the cameras, but bandwidth. Passing all those quaternions back and forth all the time would use an enormous amount of bandwidth. Signaling to a group of clients that avatar X shall now execute animation code Y is a far cry from passing a bunch of keyframes.



Wow, I need to watch my typing before my first coffee in the morning!



Why stop at gestures? If you're going to track users' hand movements and reproduce them in an avatar's gestures, why not give them more than cosmetic functionality? If there's something on the floor that I want to pick up, well leaning forward and picking it up in real life could do it on the screen. If I want to hit someone with my sword, why am I clicking buttons when I could just be flailing my sword around in a broad sweep?

If capturing gestures is good, then surely capturing non-gestures that still have meaning is also good?



I think the MPs will really run amok once they discover that you can play Manhunt 4 with gesture controls, Richard. *enjoys the thought* Now we'll just need a couple of "force feedback"-overalls simulating the swing of the sword and you'll no longer need to pay the club fees in order to perfect your head-chopping skills.

I've been actually waiting quite a while for something like this to appear and I'm very glad that we're finally starting on getting there. I think once they adapt the MMORPG-long running into real life the nerds will definitely be the sub-culture with the highest life-expectancy on this planet.


i thought of the same thing, megan. i like it. a simple markup that would trigger animations around particular words. i think your first example (using * ) is the best one. in forums you can get away with special markup commands, but in real time interaction it's too much distraction and overhead to type "[pumpfists]hey you![/pumpfists]" even for an html hand coder like myself.

single characters, like * and _ which are often used in as emphasis in text-only situations anyway would work best for their simplicity.


Richard>If you're going to track users' hand movements and reproduce them in an avatar's gestures, why not give them more than cosmetic functionality?

My point here is that free gesticulation will actually be INSTRUMENTAL for communicating, not merely "cosmetic." Currently some gesture commands already ARE instrumental. I regularly use a subset of them in place of chat - /nod for "yes," /shrug for "I don't know," sometimes /smile for "thank you." But I agree that many gesture commands, or "socials," seem to be designed largely for humorous effect. Also, all gesture commands are pretty clunky to use, and I think that's why most players don't use them much (except RPers). Free gesticulation should be easier to use, more effective for communicating and also more expressive.

But sure, it will be very exciting to use gestural interfaces for game play, travel and other activities in addition to communication. (Communication just happens to be my main research interest.)

megan>For instance, for a "beat" I could type "I'm *seriously* angry" and perhaps have the words surrounded with * tied to a particular gesture...?

Sure. Or I'm thinking words in CAPS could simply be tied to a single "beat gesture" animation, like thrusting your hands outward. Many players already use CAPS to mark emphasis (as well as asterisks). But the problem with this is that in almost all chat systems, the whole turn appears at once, so you can't see which particular word is being emphasized. However, in chat systems that post messages A WORD AT A TIME (like There or its derivatives, VMTV, IMVU or Forterra), the beat gesture could be triggered precisely when the key word appears publicly.

However, I think players will emphasize FEWER words with this kind of textual marking than they would if they were using free gesticulation. With the latter, I think they will use many beat gestures without even being aware of it.


Bob Moore>My point here is that free gesticulation will actually be INSTRUMENTAL for communicating, not merely "cosmetic."

Well they certainly were in textual worlds, so if you can get some of that back for graphical worlds, great!

>Currently some gesture commands already ARE instrumental. I regularly use a subset of them in place of chat - /nod for "yes," /shrug for "I don't know," sometimes /smile for "thank you."

These are fine if the gestures don't carry some predefined tag text. If I /nod, I don't want there to be text (or graphics) that suggests I'm nodding "enthusiastically" or "in agreement"; I just typed /nod. If I'd typed /nod enthusiastically, OK, fair enough, but I don't want an enthusiastic nod when I'm trying to show thoughtfulness, say.



Bob, as much as I like Second Life and avatars, I'm still waiting for someone to explain to me, if I have a 3-D webcam that broadcasts my real-life self and scenes and other real-life people and scenes in high fidelity that I'd want to use that to...run the synthetic representations of selves and scenes in virtual worlds.

That is, I'm quite happy to make the case for virtual worlds and everything they contain, in their own terms. But if video advances to such an extent that it becomes cheap and easy to use, and is really high-fidelity *and* able to be easily manipulated and edited by the average person, I do wonder if the need for avatars will fall away.


Very interesting article, thanks!

Re your statement: "Stand-alone gestures are fairly standardized embodied symbols that convey meaning independently of the surrounding talk. By convention, a nod conveys "yes" and a shrug conveys "I don't know.""
- I was wondering if there has been any evidence (studies or anecdotal) how cultural differences in non-verbal communcation in general and emblems in particular play out in virtual worlds? E.g. and afaik, in some cultures shaking your head means "yes", in others "no"; there are probably lots of other examples that might become the more relevant the more virtual worlds include non-verbal means of expression/communication.


Richard Bartle: I don't want an enthusiastic nod when I'm trying to show thoughtfulness, say.

Hmm.. I found those unintended aspects to be entertaining. Quite often though, the way a player's emotes are phrased (usually as macros) is essentially an aspect of the avatar (what you look like and what kind of player sits behind the screen) rather than an aspect of the character's communication...


I am of the belief that granular control over avatars is getting more attention than is required, at least in the early stages of this field. I've read articles that outline studies related to facial expressions, mouth movements and camera focusing technologies (as well as gesturing).

Rather than simply duplicating those cues we deem as important communication tools in real life (RL), let's instead fully leverage these 3D spaces first. By doing so we can enhance productivity, communication and socialization in ways never before possible. If effective virtual world communication still calls for RL components, then so be it. However, my bet is that more powerful methods of human interaction are out there waiting to be discovered.


"The problem is not the cameras, but bandwidth. Passing all those quaternions back and forth all the time would use an enormous amount of bandwidth."

It really is a problem. Second Life has the capability to allow you to upload animations, so people have done this, and incorporated these into "gestures" (so for example when I type "argh", my avatar grabs its head and pounds it into the ground) and into "Animation Overriders" (AOs), which replace/override the default walking, sitting, standing animations with ones that people prefer.

But these all take bandwidth. So when you go to a public event with 50 people there, and maybe 35 have AOs attached, you're lagging out even more, and the server is grinding trying to send out the howevermany kilobytes it is for that new arrival's walking animation to be distributed to 49 clients, and again, every time they trigger another animation change.

This is with just what is effectively a library of gestures, not a real time process. All that said, I know Linden Lab have mentioned "avatar puppetry" and I think demonstrated it as a potential new feature. Maybe the bandwidth problem is the hurdle though. It's still something that may be ok for small groups?


I'd personally just stick with facial gestures. If my avatar and others could reflect my face (smile, eyes (blink, looking around, etc.), eye brows, forehead, etc.) it would communicate a lot of nuances of communications.


I'd like to see what one could do with a multi-touch gestural system for controlling avatar action.














Alex from VR-WEAR is working on a mod to SL allowing gesticulation using standard webcam.
Example here: http://www.mobitrends.com/2008/09/05/vr-wear-sl-viewer-mod-public-launch/

