« Not so Final | Main | Free Rogue Server Achieves Significant Population »

Jan 11, 2004



There already has voice chat, complete with very cool avatar lipsync, though you do have to pay an extra monthly fee for it. It is a wonderful feature and though not extensively used except by ex-beta players who effectively got it for free, I did feel very left out at first when encountering voice chatting groups. It calls for etiquette on the part of the chatters, not to voice chat when deaf and mute players are about.


Tessa> There already has voice chat
Thx, i dont There so diddnt know - and my research is done by posting here :)

Yes i feel left out of groups that have it so it is a bit of an us and them already.



Richard Bartle wrote in that piece, "In a virtual world, you can be someone else." I haven't played There and only trialed Second Life for a week (though boy, did I play the hell out of that week), but as I understand it, that's a difference in There and SL: There isn't for being someone else, it's for being yourself in a world of low, artificial scarcity. Voice should work well there (minus the paying extra issue). Even though SL is as much a "game" as There, I expect more people do take the opportunity to be someone else, which presents all the problems in the piece as much as SW:G.

I'm aware I seem to have a bias, so I'm eager to have my misconceptions about There corrected. :)


I hate voice communications.

Voice is not seekable, not greppable, and expensive to log. It is not conducive to multi tasking, as one must deal with the speaker in real time. You can let a few lines of text pile up, and then read & reply when you are done with another task.

I agree 100% about the immersion factor, but in the end, what convinces me, is the horrible slow bandwidth of speech. You can read a LOT faster than you can speak. The bottleneck is on the consumption end, not the production end (Ie: you are likely to have more text going by than you can read - and this is *with* the supposedly slow keyboard input - how could you listen to that same information, especially if ease-of-speach allowed people to generate even more?)

I'm sure you have all been stuck in some boring meeting where someone took 10 minutes to drone through the same material that was on a hand out. And had re-read that handout 10 times over that time interval.

Another good example of speech-gone-horribly-wrong, is the game Black and White. There the designers, in a fit of sadistic glee, subjected us to uninterruptable dialog. SSSLLLLOOOWWW dialog. You'd read the text in a few seconds, and WAAIIITTT for the game to finish saying the text.

So, I'm adamantly opposed to speech in VW for purely pragmatic reasons.

- Brask Mumei


I'll preface this by saying that until about a year ago I worked for a company named SpeechWorks in a technical marketing role and I have a lot of experience in telecommunications.

I am not sure about legal issues but I am sure there are technical barriers.

For example in theory it would be possible to offer anyone who was not useing voice communication a speech to text service or fair quality. But there are a couple of problems. One is prioritizing inputs input how do you choose which one the player gets as text. This problem exists when you go from text to speech to some degree and gets much worse if you go from speech to text. People have an ability to pick a conversation out of background noise which often includes other conversations. Speech engines don't have any ability to do this at all. We had a recording made of a woman who apparently had a toddler in her arms trying to call a speech engine for her bank balance. The toddler was repeating everything mom said and this baffeld the computer. It also upset mom.

This also has implecations for speech to AI interaction. Non-speech background noise is not much a of a problem in normal circumstances but more than one person talking can wreck the interaction.

A year ago the text to speech engines were really good and different voices were already available. You could also tune the voices to give them accents. I am sure they are even better now.

Speech to text is a little harder but it is also coming along. Its a big business too so the game industry gets the advantage of R&D money spent to go after other markets.

When I was with SpeechWorks I did give some thought to ways to add value to games with speech. The key factor to me is the type of game. For example a military simulation would be more immersive if you (as an officer) could give your men (who might be AI or other players) verbal orders. Shouting CHARGE! is much more exciting than typing it. In other environments it could be much more problematic. Speech controls combined with an accent and a badly designed quest could easily cause problems in an RPG. My current feeling is you could certainly add value to some games with voice controls and voice interaction but I think that you would only be able to get a seperate revenue line in very limited cases.

I'm not a lawyer so I will keep my mouth shut to avoid looking foolish on that question.

On the question of social interaction I think there will be large impacts but they will be caused as much by decisions about how voice works in the game as they are by the decision to add it to the game.

As for the translation programs and bringing the world together well maybe some day but not with current state of the art. Maybe you could add current state of the art translation software to generate conflict but it will be a while before it allows us to play together.


For me on a purely commercial level: My main issue is how to integrate voice chat with my game and not have to have it as an M rated service. for example, I couldn't integrate officially into Asheron's Call 1 without raising the rating from T to M, becuase there is no efficient way for me to monitor it for ToS and CoC violation complaints of harassment and abuse.

As a point of interest, those kind of complaints are among the most numerous of the ones which customer service has to deal.

On a technological level; I suggest that voice chat won't become truly ubiquitous until voice fonts/masking is a way of life, not a novelty that kinda-sorta works. We've been using it for a long time; I think I first used VOX while playing Air Warrior in 1990 or so and voice masking isn't the issue for that type of game that it is for RPGs.


To expand on Jessica's comments a bit, community service on voice chat is a real bitch unless you record all the conversations and can rapidly access/search that database. Easy with text, tough with actual voice data.

I agree that until we get to voice fonts, direct voice comms are a mistake, especially in an experience like Second Life. However, many different groups within SL have independantly turned to Teamspeak and other 3rd party systems.


For a world like There, voice really makes sense as we are trying to replace the telephone as people's preferred method of keeping in touch. But, I can imagine the design issues for most MMORPGs.

Having lived and worked in Japan for 6+ years, I have a tendency to view voice chat and text chat as forms of foreign speech. I used to interact heavily with American businessmen that didn't understand Japanese, and Japanese businessmen that didn't understand English. To add to the fun we would often have some on both sides that could (mis-)understand both. In There, I think we are seeing people that are very good in either text or voice and a few that are good at both. We also have a good number that really prefer not to learn the other language. And, its also not uncommon for members to translate from voice to text. We regularly have Executive Chats that are done in voice, these are often attended by a number of very fast typers that translate in real-time for the text chatters.

Again, these are patterns that are very familiar for anyone that has dealt with groups of mixed linguistical backgrounds.

On the customer service angle, the practice within There is to estimate the customer service cost of supporting a new feature like voice during the early design concept stage. From what I can see another emerging practice is to give members increasing control of the environment through privatization of in-world resources.

Once members have a vested interested and the tools to resolve many of the more common issues, the costs of giving freedoms like voice chat decrease substantially.

A simple example, might be a race track. If this is a 'public' property then all paying members have a 'right' to be there. By default, they also have a inferred 'right' to an environment free from griefing. But, as mentioned above the cost of providing new freedoms and maintaining a griefing free environment often outweighs the value of the proposed feature.

However, Privatization changes the cost of Freedom equation, as in any economy individuals through privatization are more adept at the micro-management of in-world resources.

So going back to the example, if the company instead rents the Race Track to an individual and gives them the tools to control that private environment, you now have a totally new dynamic, one that is less likely to escalate into a customer service issue. In fact, when customers to a race track are being harassed through a feature like voice, because the track operator is paying rent for the opportunity to do business there, they have all the right in the world to remove troublemakers from their area, thus defusing the problem.

Going forward, I think we will see Privatization used in all sorts of areas. Personally, as the economist, I would like to see a good mix of both private and public areas (best guess 2:1 ratio). Today, we have private areas that hold 20-100 people, I would hope in the next few years we get that number up to 2,000 - 10,000.

Even so, I think voice may be similar to commercialized merchandise, as there are environments where that feature works well and others environments where it feels very odd.



(Insert IAALS here.)

Legally, I don't see verbal-speech as being terribly different from text-speech. As has been mentioned above, the game operator loses an element of control over the content of the speech and may suffer a rating increment (Teen to Mature) due to this reduced level of control. However, if the game were to offer verbal-speech as an option with an additional agreement, the customer agrees that any content of this service carries an M-Rating, I don't see a problem. This reminds me of Second Life's zone-rating. Some zones are rated PG while others are Mature. You assume a certain element of risk or agree to potentially offensive content by entering a Mature zone.

On a personal level, I do not intend to utilize any form of voice-chat in an MMORPG for the foreseeable future. I usually play these games to escape from the RW and generally dislike immersion-breaking aspects such as voice-chat.

Lastly, to comment on SL in particular (yay for the SoP accounts!), I am constantly surprised in SL by how many people try to have their VW avatar (or AV in SL-lingo) look like their RW person. I see so many human-like and RW-looking avatars that it shocks me. Some individuals (or at least one) even offer services to custom-build an avatar that resembles a real-world picture. i.e. The person will create an in-game avatar that looks like you (or the photo you give them). Personally, my avatar looks *nothing* like I do IRL and probably never will. What's the fun in that? If I wanted my avatar to look like me, I might as well log off and go to the local pub. (That's only my opinion, though.)


One of the things that I was wondering was if there was much legal difference between text and speech, particularly in cases were we are talking about ‘hate speech’ of different times.

Unless someone was acting / role playing very well I would imagine that any idea of a ‘its just a game’ defence would be removed when the person to person mediation was simply a VoIP circuit, if that defence would every be stand up against anything that is.

If a MMO becomes a VoIP hub do they need a telecoms licence I wonder.

Just to segway into look-alike avatars, for those that want that sort of thing I’m surprised that this lot: www.digimask.com have not gained more prominence.



I apologize, I should have been clearer. My above comments are solely limited to this article's context (VWs and online gaming). IRL, text-speech and voice-speech can have significantly different legal aspects and penalties. i.e. Compare an op-ed piece criticizing a local official for unsavory business dealings and a speech made to a crowd gathered outside the town hall. Similar purpose, substantially different mediums, locations, number of persons reached, possible reactions and potential legal liabilities. In addition, the latter of the two, the town hall gathering, also implicates other First Amendment rights, such as the right to peacably assemble, which the op-ed piece would not.

Ren, I hadn't seen the Digimask technology before. However, before it can influence an environment such as SL, there would need to be an import option for the Digimask results (which, in SL, there currently is not). One of the reasons I mention SL in this regard is because of the extensive amount of customization available for one's avatar. It really is amazing. Conceivably, in SL you could make your avatar resemble almost anyone by playing with the sliders and options. (Which itself implicates legal rights in and to your personal image or likeness - Rights of Publicity.)


Ren asked "If a MMO becomes a VoIP hub do they need a telecoms licence I wonder."

Many companies have set up internal VOIP networks to provide less expensive communications for thier employees. They did not have to register as a communications company (the process in the USA).

In the US you have to register for regulatory purposes including filing tarriffs and also paying into something called the Universal Service Fund (which was created to subsidize lines to isolated locations like poor farms and now subsidizes lines to isolated locations like exclusive ski resorts in the Rockies - great country eh?)

I doubt that game companies would be considered providers of telecom services since they are not pricing them seperately or building any kind of network to support delivery of the service.

I can't speak to the crazy rules governing telecom in other countries, only the crazy rules of my own.


I've personally used RogerWilco when playing UO several years back. Some members of our guild would opt out of using it, but we almost always had 8-10, minimum, in the room. In our guild no one really cared about ambiguity (sp?). We were more interested in cohesion when going into guild wars and such.

From what I understand, Second Life (sorry if this was mentioned already, I've been skimming) or There have been working on voice software that will basically make people's voices unrecognizable. Not necassarily in a robotic manner either. Players will have the ability to select how they want their voice to sound after going through a vocoder. If I'm a small, female character, I may be able to select some sort of 'petit female' voice option, that will take my raspy male voice and convert it on the fly.

I have mixed emotions on the whole voice chat via games/VWs. On one hand it makes communication much easier and quicker, but on the other hand you get the occasional tard that purposefully tries to get into your head by playing/acting the role of the oppososite sex. I have no problem with people playing roles of the opposite sex, but to do that for the sole purpose of toying with people's emotions and leading people on...that gets annoying.


"Common carrier". :) The cheapest form of customer service is to deny there's a problem.

Seriously, though, I expect the trend will be to restrict players from being able to talk to each other, rather than broaden the means. I don't think it's a fluke that the only robust MMOG success since AC is a game where enemy players can't talk to each other in-game. (And don't have a developer-sponsored official message board, either.)

Unless Jessica wants to integrate TS on the guild level and create a trust system for the guild leaders to monitor and police themselves, it will definitely be more trouble than it's worth, and probably still would be. Those who want to be responsible -and- conscientious about the service will stay the hell away from officially supporting voice chat. Those up and coming who don't understand the basics will be condemned as sacrificial guinea pigs.

All for trying new things, but this isn't even worth the bandwidth.


(Copied from my blog entry, which should have also pinged this site.)

I have played several online games (mainly MMORPGs) where voice communications is absolutely essential. The latest, Shadowbane, is the truest test of this. At times, I've been without voice communications in Shadowbane. Once, it was due to a bad chat client build, and once was due to my microphone being on the fritz.

The bottom line was, though, that without voice comms I couldn't get a heal in the thick of battle. I couldn't call targets effectively, or get folks to get an irritating (and/or deadly) target off me.

Both times, I ended up logging off in disgust, to fix my communications problem.

What makes voice comms so essential to these games (MMORPGs and cooperative shooters)? You're busy doing other things with your hands in the game. Unless you can do that, and type 120 WPM, voice chat is faster. In addition, I find it much more immersive to hear a chorus of voices than to watch lines of text fly by.

I love both TeamSpeak and Ventrilo :) For me, they make MMORPGs massively more fun.



I wouldn't put "MMORPGs and cooperative shooters" together in the same room, maybe not even in the same building; especially not MMORPGs. Voice sounds like a good idea if you have combat or other systems that rely on multiple key combination twitch response. At that point, though, you probably want to consider whether it's a good thing for the player to have to make those kinds of very rapid decisions for their character -- there's no role playing involved then, only player skill and twitch response.

Planetside has built-in voice communication, though I can't say how often it gets used by players. Given that Planetside is more of a MMOFPS (with real-time combat highly dependant upon player skill as opposed to character skill), including it makes sense, IMO.


I think Bruce hit the nail on what will likely occur to a more open-ended VWs. More good RL public policies will start finding good homes in VWs.

Hmm, perhaps it will look like the Microsoft strategy: you let people build third-party software over your OS. If it is good feature and a profitable addition to your OS, you incorporated it in and squeeze out the third-party :)

I side with Richard's "Not yet you fools..." for now.



From my own experiences, I'd have to agree with Vlad and Doccus.

All the early successful guilds in UO used voice communications, at least on the server I played on (Chessy). And in most battles, it was blatantly obvious. A crew of 7-10 peeps with voice chat would simply obliterate crews of 12-14 peeps w/o voice chat. THe people w/o the chat would be trying to type in directions/commands/targets to their troops, while the opposing force swiftly moved from one player to another, knocking them off with ease because all their communication was taking a fraction of the time and they never had to type anything on the keyboard (taking valuable time away from spells and other attacks).

Most opinions seem to point out that these folks AREN'T roleplaying. In most cases they probably aren't, but if you want to play an evil character, and band together with other evil characters to wreak havoc...this is a form of roleplaying, no?

I've never played Planetside yet, but I'd imagine Voice communication is critical in large scale wars in that game b/c there is such an emphasis on twitch speed.


Let's not lose the forest for the trees. Voice creates technical and service challenges, it's costly relative to text, it doesn't work well in some worlds and for some purposes, etc.

But, the big picture is that human beings much more readily communicate by voice than by text. Written language is perhaps 5000 years old, while humans have been communicating with spoken language probably for at least ten times as long. Infants learn to communicate by voice without any formal education, while learning to read and write takes significant formal education. The advantages of spoken language have played a key role in the evolution of the human body. Textual communication is not built into the human body in this way. For more details on such evidence, see pp. 8-19 of "Sense in Communication," at www.galbithink.org. Thus, for most human beings, most of the time, voice is much more valuable than text for making sense. That suggest that the development of voice will play a large role, for better or for worse, in expanding the number of participants in virtual worlds.

Perhaps it's correct, "not yet fools" for this expansion. Perhaps this expansion will make the quality of world experience worse. Yet voice is likely to play a key role in industry expansion, just as graphical worlds did relative to textual worlds.



We've been here before. :)


As an active voice user in There, I think voice chat is a wonderful addition. Since There is a social world, it actually makes the virtual MORE immersive rather than less. The lipsync ability (matching avatar's lip movements with the speech) combined with the body language noise, (shifting weight, ambiguous hand gestures) makes conversations feel much more natural than watching the text float up in balloons from smiling mannequins. I'd also like to say this, it's a nice feeling to make a witty comment or joke in front of a group of friends and then hear their actual laughter.


On a tech and business note:

NICE (www.nice.com) provides voice analysis systems for customer service and homeland security purposes.

Voice is coming soon, so might as well prep for it.



A small minority of us humans may find typing-and-reading communications to be more desirable in many ways than talking-and-listening communication. But the number that feel that way is tiny. Most humans prefer talking to typing, and prefer listening to reading.

Some percentage of the hardcore gamers that make up much of the current MMORPG market would prefer not to have voice chat in their games. But I'm confident that the vast majority of the world's population would prefer a game or environment with voice to one without.

Why do I believe this? Because "talking to each other" is the number one most popular form of entertainment in the world. The last time I looked at numbers was some years back, so this info is a bit dated, but still I think still quite relevant. If I'm remembering all these numbers right, video & computer games were industries in the ballpark of $10 billion a year industries. Radio was around $20 billion. Television was $40 billion. Not bad. But the telephone business that year weighed in at around $90 billion for local service, and $72 billion for long distance - a whopping total of $162 billion, bigger than all of those other media put together, plus magazines and newspapers thrown in. And that was without even counting cell phones. Even if you factor out whatever percentage of that traffic might be business calls, I still think telephone service is the number one entertainment medium in the world in dollar volume.

Me, I'd rather be thinking about how to be one of the first to get it successfully integrated in online games, rather than thinking too much about whether it'd be unpleasant for some of the first million hardcore gamers to come in and start up the market as "early adopters". Of course that said, it might well be a few years yet before it really takes off anywhere - the technical, cost, legal, customer service and other hurdles are very real. I think a mostly peer to peer approach is the way to go for now, and possibly for the long-term as well. It make you less able to guarantee any kind of quality of connection, but it offloads most of the bandwidth cost from your servers, and might make it easier to offload some of the liability as well and claim "common carrier" status. Being able to argue in court that you aren't even *capable* of monitoring all the conversations, as most of the packets never even flow through your servers, might be helpful in that regard.

-- Dr. Cat

The comments to this entry are closed.