Konami's "LifeLine" (a new Playstation game) features prominently a voice-based interface. Unlike with the speech packages many of us use (e.g. TeamSpeak) LifeLine's speech interface is its only player interface. An other difference between LifeLine and MMOG voice is that LifeLine uses A.I. to interpret the speech codec. MMOG's have people to do it.
I've never played LifeLine (to be released this month), however, I am led by GameSpy to believe that in this case the speech-person-game interface does not work well (1 star out of 5).
The odd thing about MMOGs is that the language culture is first about truncation, regularization, and simplification. There was a tongue-in-cheek article in GameSpy last year (The Automated Online Role-Player) which was all about a (hypothetical?!) robot that could play MMPOGs and "fit right in" using simple behavioral and language rules, such as:
In LifeLine's case, I wonder if its alleged failure is a case of "an expectation too far". On the other hand, For MMOGs, is the opposite true: have players have already been "caricaturized" to a point where we have become an A.I. so that all we do is embellish the messages with a personal fiction?
Or is it less sinister: text is just so much more expressive... and the meaning in the "voids between the words" is just so much richer. Or maybe, with text, we just type so much (monkies & typewriters...) that additional meaning is encoded in the spaces above the words... ahem.
- If someone says something ending in a question mark, respond by saying "Dude?"
- If someone says something ending in an exclamation point, respond by saying "Dude!"
- If someone says something ending with a period, respond by randomly saying one of three things: "Okie," "Sure," or "Right on."
- EXCEPTION: If someone says something directly to you by mentioning your name, respond by saying "Lag."
The many succinct expressions of the Autocamp 2000.(And remember to accept all trade requests from other players by giving them a melon.)
I thought that Richard's paper on the topic does a superb job of breaking down the gameplay objections to VoIP in MMOs, as well as some of the haves and have-nots issues. Text allows you to role-play and to be immersed in a way that true voice transmission breaks -- ignoring, for the moment, the algorithms for determining gender from written works -- and this immersion is important enough to accept the massive loss in P2P bandwidth caused by text chat. Sure, l337-speak, emoticons, &c allow a somewhat higher communication rate, but it is still a crawl compared to speach.
In SL, we don't yet offer VoIP, but many of our best creators use Team Speak or other 3rd party chat apps when they are working on builds with trusted team mates. Large, complex, multi-person creations benefit from the increased communication speed and since they aren't role-playing and are generally working with people who they've spent the time to build a relationship with, many of the social advantages of text (ability to carefully construct statements, &c) don't matter.
Posted by: Cory Ondrejka | Mar 10, 2004 at 23:56
I did a little experiment some years back. I believe around 1998 or 1999. I did a little macro programming and training and managed to rig my Ultima Online client to IBM's speech recognition software (can't remember the name, the one before ViaVoice). The results were more hilarious than practical.
I could give my avatar basic commands, such as what direction to walk to, going into combat mode and other things as well as type onscreen for other players to read. What was really funny was getting speech misrecognized and sent out to the gameword; my fellow players though I was a little loopy for a few days since all of my sentences would come out typo-free, but one in three made no sense at all due to recogniton problems being supplemented by the automated checker. What wasn't funny at all, and what can make or break this game, is the accurracy of getting *commands* interpreted correctly.
With the limited vocabulary this game seems to have (5000 words) and the platform's available computing power I believe recognition should be extremely high. By the GameSpy review one of two things must be failing, either the recognition engine is bad or what is happening is exactly what Nathan wrote: Mismatch of expectations. Actually, more than a mismatch it is misinformation. I am sure if the entire game could be played effectively by using fifteen words, conspicuously provided on a little cheat sheet, then the expectations would be clear. As presented, the expectation is "Its a human". Perhaps if Rio wasn't a human at all, but a non-anamorphic droid... Then again, if sex and violence sells and we take half of that away, who is this game gonna sell to?
Posted by: DivineShadow | Mar 11, 2004 at 02:20
I have used VoIP extensively in smallish known groups it’s fantastic, with a higher player turnover I found understanding people the greatest issue.
When I was living in England, it took me a while to take in and understand a lot of the various accents. I am now living in India, and have to understand India-English. I’ve also been living away from home so long sometimes I have difficulty understanding my own people - New Zealand accented English, did I mention I have South African, Australian and a real mixed bag of European friends who all have their very own flavour of English. And English is just one language…isn’t there something around five and six thousand on the planet :p
I guess AI speech is the middle ground, VoIP won’t let a male roleplay a female too well, unless he is either very young or has attached a peg somewhere I would consider painful :-) Interpreted speech could make you sounds younger, older , a different sex, hell even throw in animal primal overtones and so on, finally bringing in sound to the roleplay/avatar/character medium.
If the software was perfect (oxymoron I know), you could in theory also translate the different languages crossing a barrier to all those who only speak or write on language.
Looks like it’s still in it’s infancy though :/
Posted by: Scot | Mar 11, 2004 at 02:50
*Looks for the edit button
... to all those who only speak or write "one" language.
Posted by: Scot | Mar 11, 2004 at 02:51
Cory Ondrejka>I thought that Richard's paper on the topic does a superb job of breaking down the gameplay objections to VoIP in MMOs
Hey, thanks!
I ought to mention for clarity's sake that there are some virtual worlds for which VoIP isn't an issue, for example ones where there's no central role-playing paradigm. It seems to me that voice in SL as a whole would be fine, although people who have built game-like areas within SL might want the facility to be switch-offable.
Richard
Posted by: Richard Bartle | Mar 11, 2004 at 04:34
Nathan> "have players have already been "caricaturized" to a point where they have become an A.I. and all we do is embellish the messages with a personal fiction?"
Ahahaha! Isn't that exactly the same as 'real life' though? We react in each moment according to our conditioning/genetics/programming and then tell ourselves stories about what's *really* going on!
Posted by: Tessa Lowe | Mar 11, 2004 at 04:40
Nathan Combs>Or is it less sinister: text is just so much more expressive...
Let's say that voice had all the major features of text - storability, searchability, scanability etc. - except that (as now) it didn't have editability. When I type something, I can backspace and retype it before anyone gets to hear it.
So, let's further suppose that Terra Nova switched from text to voice, so that from now on all the posts had to be spoken rather than typed. Would that be a great boon to communication, or would it be a hindrance? Would you be more encouraged to post here or less encouraged to do so?
Does the fact that people can edit what they say in a virtual world before they say it add to or detract from play?
It depends on the virtual world.
Richard
Posted by: Richard Bartle | Mar 11, 2004 at 04:48
Super post, Nathan.
I don't think the "d00d" problem is a feature of MMORPGs. I think if you transcribed most human conversations, they seem to be created by bad AI chatterbots. That's why Julia was so effective in MUDs.
DS>I did a little experiment some years back. I believe around 1998 or 1999. I did a little macro programming and training and managed to rig my Ultima Online client to IBM's speech recognition software (can't remember the name, the one before ViaVoice). The results were more hilarious than practical.
Count me impressed -- I would have never thought of this, much less figured out how to implement it...
But more importantly, I think DS hits the nail on the head with his comments. Voice could *really* be effective in VWs as an interface. The keyboard/mouse interface is clunky and counter-intuitive as a means of avatar control.
Language, howevever, is an interface pretty much as subtle as the human body (compare narrative and dance) -- so the real potential for fun with VoIP in VWs would be making speech acts visual and effective on the represented world.
Kind of reminds one of magical incantation, no?
Richard>So, let's further suppose that Terra Nova switched from text to voice, so that from now on all the posts had to be spoken rather than typed.
That makes me wonder: VOIP is synchronous, e-mail is an asynchronous -- what is chat? Asynchronous or modularly synchronous? Is chat just rapid-fire email?
Posted by: greglas | Mar 11, 2004 at 09:47
me> Language, howevever, is an interface pretty much as subtle as the human body (compare narrative and dance) -- so the real potential for fun with VoIP in VWs would be making speech acts visual and effective on the represented world.
To clarify, I haven't played Lifeline, but it seems from the article that the speech is directed to directing the standard avatar in a 1P environment -- I'm envisioning making the speech performative in a MM environment.
Posted by: greglas | Mar 11, 2004 at 10:50
Sorry to triple-comment, but I just had one of those "doh" moments... the "emote" command. That's all I'm talking about -- make the AI provide a visual feedback after parsing the eqivalent of the emote commands in MUDs. And, as in MOOs, I bet this would build out much more quickly if it were open-sourced and allowed the users to do the work.
Something for Cory and crew to think about.
Posted by: greglas | Mar 11, 2004 at 10:53
Nathan> have players have already been "caricaturized" to a point where they have become an A.I. and all we do is embellish the messages with a personal fiction?
I think the Turing test will first be passed in a VW. Not because AI has risen to human level, but because human language will have adapted to AI scripts. This is based on a hunch (that I think Cory, Dave R. and Raph all disagree with) that in future MMOGs, the ratio of AI to people will be very large. The humans have innate sensitivity to social norms and will try to conform to the language patterns of the autonomous agents.
The context where I first noticed this: in a PvP environment, where visibility is low, it makes sense to walk, like the mobs do, rather than run like a PC. The nature of the gameworld encourages humans to adapt whatever AI patterns are beneficial. In a role-playing world, language processors will have to interpret what we say and then speak it; That's bound to make humans sound like AI, and if you want the machine to do a very good job of it, you'd better babble like the bots do.
Posted by: Edward Castronova | Mar 11, 2004 at 12:57
greglas> Something for Cory and crew to think about.
Second Life and There.com both parse emotes and cause the avatar to generate animations, fx, audio, &c. They approach some of the issues from a different perspective -- There.com attempts to have your avatar's overall "emotion" match the current conversation and environment automatically while Second Life tends to take a marionette approach by letting the user easily control the emotes in real time. Second Life's is built to allow users to connect any emote to any combination of animation and audio cue, plus scripts can triggered as well.
Posted by: Cory Ondrejka | Mar 11, 2004 at 13:36
Ted> I think the Turing test will first be passed in a VW. Not because AI has risen to human level, but because human language will have adapted to AI scripts.
ROFL . . . I love this! People always forget that there are two ways to pass the Turing test, make machines smarter or make people dumber!
Posted by: Cory Ondrejka | Mar 11, 2004 at 13:41
"and this immersion is important enough to accept the massive loss in P2P bandwidth caused by text chat"
Maybe it is a sign of my sociopathic tendencies, but I'm not going to cry over the loss of that "bandwidth". Speech is SLOW! If terranova were all speech, it would take me considerably longer to listen to the most recent updates! I always shudder when I see some interview that I wish to read posted as an "audio interview" - this usually means you must spend 60 minutes listening rather than 10 minutes skimming.
A lot of VW communication is 1 to many. 1 to many communication has very different constraints than 1 to 1 communicaton. There are two halves to any communication:
1) The act of sending the message.
2) The act of receiving the message.
There is no doubt that voice makes sending a lot easier. Just babble away your thoughts for 15 minutes, and you're done.
Text, on the other hand, makes receiving a lot easier. You can quickly parse the message and pick up the 3 words that are actually important. You can disregard it as pointless in a quick glimpse. Precisely *because* the sender has lost the bandwidth, the sender is more likely to ensure their thoughts are written efficiently.
So, which is better? In most many-person discussions, one spends much more time receiving information than sending it. Thus, it seems the logical course of action is to optimize the reception of information.
We should also keep in mind the whole duplex issue of voice communication. It is hard to have two people both talking and both listening at the same time. It is trivial to have two people both typing & reading at the same time.
What disappoints me is that everyone is so eager to reduce the cost of sending messages, they never think of the fact the only point for sending a message is to have someone receive them. It should thus be axiomatic that the sender has the responsibility to make their messages clear, concise, and easily and efficiently parseable.
- Brask Mumei
Posted by: Brask Mumei | Mar 11, 2004 at 14:05
We're blending two arguably discrete things here, right? VoIP as general interface and VoIP as interpersonal communication.
Richard's article and most of the comments have been about the latter. What do people think about the potentials of the former (which appears to be what Lifeline tried, perhaps unsuccessfully, to accomplish).
Posted by: greglas | Mar 11, 2004 at 14:16
As a general interface (and it wouldn't really be VoIP since the vocal commands only need to go as far as the client.. from there they can be transmitted like a command from any other input device) it takes a lot longer to say "move forward...stop... step back...stop" than it does to press "w" for a second then "s" for a second. Clicking on a particular screen element to target a creature and tapping a key cast to "lightning bolt" will invariably be faster than saying "Target Ogre... not that one... not that one... not that one... okay.. Lightning bolt! Lightning bolt! Lightning bolt!" (Sees if anyone catches that reference). Beyond that many current users just feel silly talking to a box on their desk (or maybe that's just us older guys).
I'm actually more impressed with a game like Earth & Beyond where the entire game can be played quite easily with just the mouse (and only two buttons needed). FFXI tried to construct theirs to work with just a Playstation controller which would have been cool, but the interface was a bear to navigate as it was primarily menu driven - point and click is much more agreeable.
-----------------
As a means of communication I'll echo much of what's already been said. In 1 on 1 communication or in a small group of friends, voice comms work fine and can enhance gameplay and even roleplay to a degree.
On a mass scale it breaks down to chaos. People who are fine to talk to in text often times turn out to not be the people you expect in voice. Text is one of the many anonymizers people covet in online games... it makes all people equal to a certain extent by blurring nationality, age, sex, and other factors (though the format of your writing can sometimes also show the real you... d00ds). In voice you discover your guild leader is a 12 year old kid who can't figure out to hit the mute button while yelling at his sister not to touch his stuff. A bright kid he may be and sound mature enough in text chat... in voice his whole persona changes as you discover more of the "real" person.
Outside of that voice chat in more general (non-grouped) settings turns into a massive party-line sort of chat. Voice comms in an MMOG isn't like 20 people standing in a room chatting... it's like 200 people all trying to tell you something at once. The cacophony would be deafening.
I was in the beta for a squad based MMOG (Fireteam) that used voice as it's primary chat interface. The voice function worked great - being the late 90's it was quite impressive. The function in gameplay was hideous. The primary issue being those who you'd end up needing to speak to... the aforementioned 12 year old yelling at his sister or parent.. the over-weight guy with the belabored breathing... the "muncher" who'd constantly eat without moving the mic... the smoker with an oral fixation constantly (and presumably unconsciously) chewing on the mic. All made for pretty annoying online conversations with people who across a text interface are otherwise great people.
Then, of course, is the "telephone" factor... that people don't understand that it is acceptable to be in a online voice chat in a game and have silence. I found that players generally loath a gap of silence and will begin saying inane things or making silly noises just to fill the void where in text they would probably have just not typed anything. I have tried voice chat again since the beta in Counterstrike games and in Planetside (which features built in voice comms)... every time I've tried it (as I don't have a close knit group of people I know playing that I can voice chat with exclusively) has been just as horrible as the first time.
Text is, IMO, just better for both interface and communication.
Posted by: Sourtone | Mar 11, 2004 at 15:19
Richard,
"Kind of reminds one of magical incantation, no?"
It does. And that was part of the accomplishment here. I could blurt out "Flamestrike" and my little avatar would cast the spell "Flamestrike", or I could say "Bail out" and the guy would teleport to a safe spot (yes, I was using helper programs). Truth of the matter, keyboard is soooooo much faster. But at the same time a keyboard is so wide it is hard to multitask with a mouse (actually, I use a trackball to minimize movement). I could easily be using my mouse and keyboard interfaces to direct where I was running to, then yell "Heal potion" and quaff one down without taking my hand off the mouse or keyboard. It was a nice complement. The problem was when trying to do things under pressure, one misrecognition makes you more upset and the next word is not going to come out right. Trust me on this one, there is no program today (or the next 3-4 years) that will recognize when you yell "Use the fricking bandage you stupid piece of crap!" at the top of your lungs when you originally trained it using a calm voice in a quiet room to recognize the phrase "bandage myself".
On a positive note, I found I could actually "talk" while walking and doing other things ingame, which is normally extremely hard to do in Ultima Online since you either have to type with one hand, stop, or can't type at all since you've got to hit those macros with the same keyboard.
Another alternative I found back then was to use a specialized keyboard that I could program, I used the PC-Dash from Saitek. It worked wonderfully well but both units I bought eventually went berzerk. I know there are other programmable keyboards out there, but their price is a bit prohibitive and their programming flexibility is generally lacking. Then of course I couln't 'talk' while using it since I now needed one hand on the PC-Dash, two on the keyboard and one on the trackball. Juggling does work to a certain point.
On a related note, talking to the computer and issuing commands is fine if you're locked up in a room during the day hours. If you think you might use this interface at 2am while your wife is asleep, think again... Plus the notion of talking to the computer really seems exacerbate the feeling of competing for your personal time. I would strongly caution anyone from doing solo 'entertainment' tasks using voice recognition. The perception seems to be not that 'you're playing a game', it seems to border on 'you're having a relationship' - either with the people on the other side, or with the machine! Big red flag here.
Posted by: DivineShadow | Mar 11, 2004 at 15:38
ST>(and it wouldn't really be VoIP since the vocal commands only need to go as far as the client.. from there they can be transmitted like a command from any other input device)
Hmm... I was thinking it would a server-side interpreter, but I guess it would make much more sense in terms of efficient bandwidth and security to parse the player speech at the client level and capture whatever would be relevant to transmit.
Re what would be relevant -- wouldn't it be interesting if something like voice pitch or volume were relevant to the effect of the command? Other members of the player's household, though, would surely tire of the caterwauling! And instead of having blistered thumbs and fingers, you'd just be hoarse...
Posted by: greglas | Mar 11, 2004 at 15:48
DS>Plus the notion of talking to the computer really seems exacerbate the feeling of competing for your personal time. I would strongly caution anyone from doing solo 'entertainment' tasks using voice recognition. The perception seems to be not that 'you're playing a game', it seems to border on 'you're having a relationship' - either with the people on the other side, or with the machine! Big red flag here.
That's fascinating...
Posted by: greglas | Mar 11, 2004 at 15:50
I think the key motivator for many people that visit virtual worlds is the desire to either be one's self, or the exact opposite, to not be one's self.
In real life there are not a lot of things that people can control with voice, so I think that for most main stream markets using voice as an interface would seem awkward at first. For example, in real life I don't tell doors to open, I reach out and open them or walk up to them and let them open automatically.
However, I could see voice commands as a way to help players feel that they are in a very futuristic world, but adding voice commands to middle-earth might seem out of place, unless it was only linked to magic and/or pet commands.
On the other hand, voice in games will be a no brainier in the future. Is it just me or are most games like watching a silent movie? Or like reading a Shakespearian play while its acted voiceless on the stage. The fact that I have to type and read in most MMORPGs is just another diversion to the real action.
As far as what people want to do in virtual worlds, I think they want to be able to do everything they can do in the real world, and almost everything they can't. Voice has been part of life since God gave us ears, and the fact that it's not part of more virtual worlds has a tendency to underline the fact that we still have many issues to resolve before we close the magic circle.
-bruce
Posted by: Bruce Boston | Mar 11, 2004 at 19:59
Brask> Maybe it is a sign of my sociopathic tendencies, but I'm not going to cry over the loss of that "bandwidth". Speech is SLOW! If terranova were all speech, it would take me considerably longer to listen to the most recent updates!
There's a big difference between P2P chat and 1-to-many publishing. MMO communication is generally either multiperson chat or real-time activity coordination (whether combat, trading, building, whatever). In both of those activities, the total information transfer by voice (and the speed with which information can be conveyed) is much higher than chat text.
If MMOs were BLOGs, then I agree that the initial opinion pieces might be best served by text -- although, IMO listening to Terra Nova folks speaking at SoP provided much deeper insights into their thoughts and ideas. Speech conveys levels of meaning that text can never hope to.
So, in the usage case that we're discussing here, namely MMO communication, it is pretty clear that text provides a far slower and lossier form of information transfer -- thus, lower bandwidth -- than text.
Posted by: Cory Ondrejka | Mar 11, 2004 at 22:15
"There is no doubt that voice makes sending a lot easier. Just babble away your thoughts for 15 minutes, and you're done.
Text, on the other hand, makes receiving a lot easier. You can quickly parse the message and pick up the 3 words that are actually important."
Here you nailed on the head what makes dictating text to a computer so difficult. I have ViaVoice 9 installed on my computer for dictation, but I rarely use it. The recognition rate is simply impressive, it uses continuous speech (just like you would normally talk) not 'discrete speech' like the technology I used many years back (had to make a pause between words unless they were a pre-set phrase) - Again, the recognition rate is simply astounding... You can pick up the phone and the thing will pretty much write what you're saying. But there is a problem with doing text-to-speech for anything but quick sentences: It requires that you have *really* thought out what you're going to say and *precisely* *how* you're going to convey your message once it is in text form. It might not look like a biggie at first sight, but once you try to actually do it you realize what's going on... You take as long as typing because you stop and think over and over what you're going to say, then get it wrong and go back and edit. It'll make you go back to typing real fast. ... Perhaps someone can think in such a clear and straight line that they can actually use this without a hitch, not me.
On a similar note I've had the unpleaseantness of working as a journalist and having to transcribe taped interviews. In this case *you* are the Speech-to-Text engine. That will also drive you insane. People will say something and change the whole sentence halfway through... With the end result being gibberish that you have to decipher and turn into coherent text.
Now if we only had a program that did *that* work, then speech-to-text would be a more palatable choice. Actually, I hope one day I could just tell my computer *what* to do and it will do it, instead of having a big chunk of my brain dedicated to translating between the *what* to do and the *how* to do it using the clunky computer.
Posted by: DivineShadow | Mar 12, 2004 at 00:10
Cory> "MMO communication is generally either multiperson chat or real-time activity coordination (whether combat, trading, building, whatever)"
I certainly concede that for real-time activity coordination that speech is superior. One usually is already busy with one's hands trying to control the activity, so using a different system for communication is very effective.
For multi person chat, I'd have to strongly disagree. Even when chatting with a single person, 90% of the time I'd much prefer text. Why? Because I'm likely doing something else at the same time.
In a sharp contrast to the "voice destroys immersion", I'm in the "Don't force me into your world!" camp. I only played Ultima Online full screen for a short while before I went to windowed mode. SWG I have Alt-Entered so I can have my web browsers/compiler windows/whatever overlaid on top.
This isn't, as some claim, a flaw in the VW. It's not like I fully immerse myself in RL either, after all! When commuting, I don't pay attention to the train trip, but read a book, or play a hand held video game.
Thus, if some one in the Real World asks me a question, I like to be able to turn away from the computer and respond. I can glance back to see if there has been any text ongoing from the other side, and perhaps do a "AFK" depending on the relative urgency of the two threads of communication. I can be distracted for a minute, and then go back and read all the text that accumulated. I don't know how that would work with voice.
It all comes down, I think, to a magic circle issue. More I hear about the magic circle, the more I become convinced it doesn't apply to the VW that I play. My VW are other worlds. As such, they may have circles within themselves (much like RL has magic circles defining gamespaces within it), but don't form a magic circle in and of themselves.
- Brask Mumei
Posted by: Brask Mumei | Mar 12, 2004 at 10:54
Brask> For multi person chat, I'd have to strongly disagree. Even when chatting with a single person, 90% of the time I'd much prefer text. Why? Because I'm likely doing something else at the same time.
Sure, text works well for time-shifting since it isn't real-time. I think that this is an excellent additional point (I don't think that Richard talked about it in his article, but I might be mistaken) and one that Nick should do a survey on. How many MMO players are multitasking while playing versus how many are fully immersed? Certainly most developers here at Linden tend to run SL in a window and often don't even have head phones on.
Posted by: Cory Ondrejka | Mar 12, 2004 at 11:38
Divine wrote:
"But at the same time a keyboard is so wide it is hard to multitask with a mouse (actually, I use a trackball to minimize movement). I could easily be using my mouse and keyboard interfaces to direct where I was running to, then yell "Heal potion" and quaff one down without taking my hand off the mouse or keyboard."
I wouldn't qualify this as a good argument in favor of voice controls... I'd say this is really an argument for better interface design. If your interface is so clunky that players feel the need to find alternate ways to control their gameplay then you need to re-evaluate your GUI. You'll find most new games have gone much farther in providing friendly interfaces... hotbars, fewer menu's, built in macro's (EVIL!), etc...
Bruce wrote:
"Is it just me or are most games like watching a silent movie? Or like reading a Shakespearian play while its acted voiceless on the stage."
It's not just you... though this isn't so much a problem of voice communications as it is a problem with voice actors being expensive. Actors cost a hefty chunk in linear single player games... imagine their cost in a MMOG. You could try and use something like a text-to-speech engine but I find most of those, while vastly improved from 15 years ago, still sound a bit stilted....
Cory wrote:
"Speech conveys levels of meaning that text can never hope to."
Which is actually why many people want to avoid it. People enjoy the anonimity of the text based internet.. voice comms take a lot of that away.
Though I would bet it would certainly cut down the number of male players with female avatars :-P
Posted by: Sourtone | Mar 12, 2004 at 11:59
"If your interface is so clunky that players feel the need to find alternate ways to control their gameplay then you need to re-evaluate your GUI."
Actually, I feel *all* our current computer interafaces are inadequate. But that's besides the point. ... I haven't played a single MMO game where I didn't feel a need for an alternate input method. Heck, even playing Mike Oldfield's offering, where you essentially just float around, I felt a need to have a gesture-based interface (not mouse-bound, but true open-air gestures). ... Some of the present game interfaces are notably difficult and awkward. SL being the most prominent and frustrating here.
"You could try and use something like a text-to-speech engine but I find most of those, while vastly improved from 15 years ago, still sound a bit stilted...."
I've seen (more like 'heard', actually) some developments in this area, and compared them to recordings of a human reading the same phrase, and they are undistinguishable from a human except for sometimes being able to hear the person's recorded breathing - which can be added by emulation but the that was the only difference I could spot in the samples I listened to. The rythm, inflections, emotion... Everything was there. This is still un-released technology I believe I shouldn't comment further on, but it'll get to us, and I bet very soon.
Posted by: DivineShadow | Mar 12, 2004 at 12:45
DS>>" I haven't played a single MMO game where I didn't feel a need for an alternate input method."
Try Earth & Beyond as a good example. You can actually play the whole game with just a mouse. Their chat interface is a bit cumbersome, but the overall GUI is pretty solid.
DS>>"I felt a need to have a gesture-based interface (not mouse-bound, but true open-air gestures)."
I'm not sure how well open air gestures would work since it takes a lot more movement (and thus time) to wave your hands around than swirl your mouse. On a whole, however, I think the gesture system was a great innovation when I first saw it used in Black & White. It's something I wish more games might use. I also like that it adds a certain element of skill to the game as you can tie effectivness of the action performed to the precision of the gesture.
DS>>"and they are undistinguishable from a human"
Is there a demo available of the one you heard?
I tried Microsoft's current online version and it was pretty good except for the speech cadence (some words were awkwardly short and clipped) which was still off at times.
Posted by: Sourtone | Mar 12, 2004 at 14:34
Cory> So, in the usage case that we're discussing here, namely MMO communication, it is pretty clear that text provides a far slower and lossier form of information transfer -- thus, lower bandwidth -- than text.
Seems to me there is an assumption here that more information transfer equals better communication. Not always the case in my view. In particular, I think much of the “information” gleaned from voice is actually generated by the listener. See Sandy Stone’s ancient but still relevant discussion of phone sex.
http://duplox.wz-berlin.de/docs/panel/sandy.html
Yes, you get a “better” sense of the other person in voice. But that sense is quite possibly wrong. Their voice may remind you of a teacher you hated. Or their accent convey a particular lifestyle to you. But that may not have much to do with the actual person you are talking to. When “chatting” with people who are very competent with emotes and such, I believe I’m getting as least an accurate sense of their state of mind as I would using voice. Some people are more “really themselves” when playing an Ogre in a VW than playing an office drone in their “natural” body.
Posted by: Hellinar | Mar 12, 2004 at 15:28
"I'm not sure how well open air gestures would work since it takes a lot more movement (and thus time) to wave your hands around than swirl your mouse."
Try the Mike Oldfield "thing" and you'll see why open-air gestures are better than a mouse for it. Of course the gestures have to have an AI bahind it to interpret what you're *trying* to do and assist, otherwise it would be really bad. Open-air gestures can be as obvious as moving your body around, or as subtle as keeping your wrist steady and moving one or two fingers around.
"Is there a demo available of the one you heard?"
I searched.... I can't be sure this is the same technology, but pretty damn close, and from the same team: http://www.research.ibm.com/tts/coredemo.html
Try the US Female 1, US Female 2, and US Male 2
The referring page is at: http://www.research.ibm.com/tts/
With a little bit of help with extra speech-specific tags it becomes undistinguishable... In the samples I tried I did not know (and was not told) which one was which. I listened to them over ten times, and frankly to this day I'm not even sure if they were all human, and they were testing some wacky stuff (unlikely), all generated, or a mixture. Then again maybe my hearing is not that great... :)
Posted by: DivineShadow | Mar 12, 2004 at 19:45
"On a whole, however, I think the gesture system was a great innovation when I first saw it used in Black & White."
Sure, it looked innovative when you "see" it used. But if you try and use Black & White's gesture based interface, you'll go through 3 mice before completing the first section of the game. This is especially frustrating when you consider that they did break down and add hotkeys for the leash commands. This just made the contrast between gesture based and hotkey based all the more stark - one could specify the leach in tenths of a second, while one would spend *minutes* repeatedly trying to get it to recognize your gesture. (Half the problem, I admit, is in their brain dead implementation. There was no way to signal the start of a gesture and stop of a gesture, so thus while repeatedly trying to do gesture A, it would often misfire as gesture B)
Black & White should stand forever as testament to how horrible UIs can be when people put ideology ahead of pragmatism.
- Brask Mumei
Posted by: Brask Mumei | Mar 15, 2004 at 10:02
sourtone: " it takes a lot longer to say "move forward...stop... step back...stop" than it does to press "w" for a second then "s" for a second. "
I think that language is best when used for large sets of high level commands. As anyone who has played a large 4x game can tell you, their focus on micro management tends to scale poorly.
And it's not too hard to conceive of different scenarios where you'd like to be able to easily control large amounts of resources in very specific ways without having to micro control every single movement, "Green Plattoon, go ravage the enemy state of Sweeden, but spare their chocolate factories."
Posted by: md | Mar 15, 2004 at 14:44
Regarding text vs speech, we can look from the perspective of data compression. Say you are trying to send HDTV signals. To compress or not to compress? If so, how much? What's the balance?
If anyone remembered the movie Foxfire with Clint Eastwood, the thought-interface with the advanced Russian MIG shed some light on the text-voice-thought interface/communication issue.
Speech interfaces does have it issues: I personally like to verbally call my magical incantations, but I'm the type that plays Live Action Role Playing (LARPs).
I go into the forest and dress the dress, walk the walk, and be all that I can be in a fantasy world.
Frank
Posted by: magicback | Mar 16, 2004 at 01:23
Wired has a review of LifeLine.
What I find provocative, is their suggestion that LifeLine's communication ills may be a component of the immersive experience:
Indeed, at times it takes the patience of Job not to get frustrated with Lifeline, but the player's anger is typically not directed toward the program's failure to function but toward the character's "stupidity" -- and as soon as the player fires off a swear word or two at her, Rio's got a snappy comeback. And she'll sense if you're having trouble finding the right words: "You've got to have patience, and you've got to think," she gently admonishes during puzzle-solving sessions.
And that's another way that Lifeline creates a more immersive game-play experience --
...To err to seem human is then a virtue and a gamer's delight. Does this then imply the converse: to err in RL should be therapeutically handled via a gamer's fragfest fantasy?
Posted by: Nathan Combs | Mar 16, 2004 at 20:35
Opera just added voice commands -- just a matter of time before it shows up in more games, imho...
http://www.opera.com/pressreleases/en/2004/03/23/
Posted by: greglas | Mar 24, 2004 at 11:29
You guys are weird. Go get a personal life/job and go have sex.
Posted by: xxxxxx | Jun 21, 2005 at 17:46