« Big Brother wants a cut | Main | On Expertise »

Nov 14, 2008

Comments

1.

I think we need to be very, very careful when mapping in this way. Small changes in environment, setting, context, relationships, etc. can lead to huge changes in behavior.

I'm reminded of one of the greatest marketing fumbles of all time: the creation of New Coke. The Coke people were up in arms about the "Pepsi Challenge," a blind taste test in which a large majority of people chose Pepsi as being better tasting. Coke replicated the test and found the same thing; people said Pepsi tasted better. So they invented New Coke, a sweeter version of Coke, closer in taste test results to Pepsi.

And the kingdom went berserk. People demanded that Coke not change, in numbers that surprised everybody. Coke backed up a bit and created Coke Classic, and stopped just short of pissing away one of the greatest brands of all time.

For years, they couldn't figure out what happened. People liked Pepsi better in taste tests, yet said they preferred Coke in almost opposite numbers. What was the issue? It was almost ten years before a new researcher came on board and pointed out, "People don't taste cola. They drink it."

Whathafuh? Oh. Yeah. Right. We don't take little baby sips of cola from dixie cups and swirl them in our mouths. We drink actual, whole bottles of the junk. And when you *drink* Coke, you get a much different experience; more burn, less aftertaste, not as heavy a feel on the tongue.

The only "mapping" difference to the "virtual beverage test" was smaller servings. Yet that tiny difference was enough of a glitch to nearly topple Coke.

I'm sure there are some things that can be tested in virtual worlds. But I'd also think that most of them are things that happen in virtual worlds.

2.

I've read through the paper now.

Yes, it's a good explanation of the problem, and signposts the direction that this kind of mapping research should look first. I was particularly pleased to see that you didn't assume that the virtual/real mapping was just one way - you allow for the possibility that people could do experiments in the real world to predict the effect of doing something "similar" (in some way) in a virtual world.

A couple of things you might want to consider (or maybe did already but didn't put in):
1) If behaviour between identically-coded MMO servers is different, this might suggest that such behaviour can't be mapped between the virtual and the real. In other words, if server X and server Y of the same game world are at variance in an aspect that you were hoping to test, you probably can't test it (at least not without knowing why they disagree with each other).
2) Sometimes, you don't need "face validity". There may be processes in an MMO that map very well to a real-world process, but they're skinned differently. For example, it could be (I'm not saying it is!) the case that the spread of use of an exploit follows the same kind of path as a real-life contagious disease. An experimenter could use this equivalence deliberately to insert an exploit into an MMO and then track its use, mapping it to the social networks of those using it, and so on.

Those are just some ideas to throw into the mix. You'll notice they don't contradict anything you suggest; this is because I find nothing to disagree with! It's a good statement of the problem and proposal of a possible solution.

Richard

3.

Hi Dmitri,

I confess I mostly only skimmed through your white paper so far, but my first reaction is that this sounds like bad case of virtual world exceptionalism, don't you think? How can something that exists inside this world be parallel to it? I have criticised the parallel worlds way of thinking in an essay titled Virtual Worlds Don't Exist. Would you call a golf club a parallel world?

Observations and research results from one context can obviously often be applied in another context, such as when we apply the results of organizational psychology, studied in the context of companies, to educational settings or computer-mediated groups. As you know, the question of to what extent are the results generalizable is always present, even though it's not called "mapping".

Exploring in detail the applicability of results obtained in various computer-mediated settings to offline settings and vice versa is certainly a worthwhile pursuit, but I suspect that it may be difficult to generate a general theory out of it. Perhaps it would be more fruitful to consider these issues in the context of some specific research. This would help to eliminate many of the ambiguities that you introduce when you use the concepts "virtual world" and "real world" (see essay). I think a concrete case would also open up your thinking to more specific (and thus more constructive) criticism.

All the best,
Vili

(minor nitpick: why is GDP written in brackets after "a measure of inflation" on p. 6?)

4.

Validity's such a massive bugaboo when it comes to virtual worlds. It's all about how far down the rabbit hole you want to go.

If we need to establish parallels between virtual worlds (intravalidity, I guess you could say,) OK, that's relatively easy to do.

Then you get into the question of recruitment. Do we post on the forums? It's speculated that forum users of VWs have very different characteristics than non-users, and we could skew our results even based on the -forum- that we pick.

If we can convince the operator of the VW to send out an in-game message to recruit our population, OK, great. That's a more universal way of reaching people.

But then we run into the issue of timing. Are we getting a true slice of the VW population by sending it out at prime time? I'd say not, because of time zone issues and population play-time tendencies (i.e., swing-shifters, weekend warlocks, etc.) [EverQuest has neatly solved this by issuing everyone an opt-in on log-in, but the below is still a concern.]

But THEN we run into the issue of translation and adaptation. The whole point of a virtual world is a 'place without space,' so we need to make it as least Western-centric as possible. So we need to recruit on non-European servers, etc... and then our measure (for example, an MMPI) breaks down because that's just what many measures do when taken outside of the culture in which they're crafted, even with faithful translations.

Overlooking all THESE concerns is a lack of incentivization - especially with longer surveys. There's no interpersonal pressure to take that survey ('I want to please my professor,' or 'I don't want to seem like I'm not part of my friends,') so we're getting a certain slice there.

And IF we incentivize through, say, an in-game item, are we only getting the people to whom that item is useful? (for example, if it's a cosmetic item, only the people that care about the cosmetic appearance of their avatar)

And how rigorous are the responses we're getting?

Yes, we have some pre-existing research for some of these questions, but part of the whole conceit of VW research is that this is a new media experience with new dynamics appealing to a new group of people who we're still not totally sure what characteristics, if any, they carry. If we carry that conceit, the applicability of previous social science validity research naturally gets called into question.

Now, all that said... we can do validity. There's a lot that has to feed into it, but it's doable... but it wouldn't just be able to be one researcher. I'd almost argue that it's need to be a concerted and coordinated effort on the part of a large group of researchers, but I'm just not sure that VW validity's 'sexy' enough to get that done.

MAH

5.

I suspect that many people enjoy VW's for reasons that contrast with expected behavior in regular society...its refreshing to break from constraints of reality and people play in part because they can break from norms.

A few ideas..
...less penalty for failure in a task ...dying is just a death run or reset button away... if you failed 1 out of 4 responsibilities at work or school you'd get fired or a low grade... Risk with less responsibility for failure.

A similar responsibility ditch (for players who aren't hooked into playing the vast majority of their discretionary hours) is being able to show up when you want, quit when you want, and still make progress with your position suspended. Some games have your position degrade if you don't log in to maintain them but many others don't. Some people have a social pressure to play regularly to maintain a position in a raiding group but leveling operations are generally freelance sorts of "work".

Freedom to bet "bad". Many players appreciate the ability to steal from others that a context or that role playing would allow. These types of vents really wouldn't be mapping other than to some un-doable part of the psyche. (the online porn industry probably also allows people to explore situations where they would not dare to even if given the opportunity in real life).

I'm sure that there is a good deal of mapping just as there is mapping of behavior into any behavior within an institution (Golf Club was brought up).

I do suspect there are numerous areas of virtual game play don't map and which actually add to the attractiveness of playing the games because they don't need to map.

6.

Hey Dmitri --

Caveat that I'm not a empiricist or a social scientist, but I think it is a good description of the problem.

Andy makes an interesting point about the Coke/Pepsi taste tests. The validity of data depends on the claims it is marshaled to support, so there are plenty of ways to misapprehend the relevance of any empirical data. But you know that, so I'll put it aside.

After reading this, two particular things come to my mind.

1) Virtual worlds might not be worth it for normal science.

It strikes me that when one does normal science work -- e.g. attempting to refine and test general rules that are seen as predictive of general human behavior (online or offline), the richness of virtual worlds is likely an impediment. They introduce many variables that might be better excluded. In other words, if you're testing a fairly simple hypothesis in a virtual world about human behavior online and offline, and your goal is to flip certain varibles that can't be flipped easily in real life, then maybe you would be better of setting up a much less interesting and complex environment -- keep things simple.

Btw, you provide a good overview of the problems with mapping -- it makes me wonder why we don't spend more time worrying about the "undergraduates in a lab setting" problem, which seems to be the mode of much social science research...

2) Virtual worlds are interesting primarily as what they are in themselves, not what they enable us to test

As just one example of this, the semiotic nature of virtual worlds raises relatively unique questions for social science. For instance, with the WoW "plague" -- you make a great observation about how we normally would not run around infecting our friends with a deadly plague IRL. You point out the "deadly" in this context should be read as "annoying". That's right, and the news clips on the whole plague thing left me scratching my head for exactly that reason.

But to really unpack this well, we also need to recognize that this "deadly plague" *IS* a deadly plague. It *IS* a deadly plague because part of the fun of transmitting it to another player comes from the fact that the contagion actually *DOES* signify "deadly plague" within the game fiction, and hence you get to share the (perversely fun) fiction of speading the plague. If the contagion were merely understood as annoyance, it would not travel as fast.

So we're back to something that I think is fundamental to play in humans and animals -- a nip is not a bite, but a nip is only given because, in play, it signifies a bite.

By pointing to this, I'm not arguing against using them as arenas for research. But I wonder about the petri dish idea. I wonder if they're too real for that as well as too complex. You might see them as moon bases instead -- completely real but very strange in terms of the kind of social behaviors they enable.

7.

One of the potential obstacles to using virtual worlds as experimental domains comes in the form of university ethics committees. If you want to experiment on people (which you do), then you generally have to get the permission of those people. If they know they're being experimented on, though, this can spoil the results. It could also lose players - some people will inevitably be creeped out by the idea of being lab rats. Even something innocuous like reducing the drop rate for some reagent could need institutional review board approval.

Given that this is likely to be a common problem in CMC, though, I'm sure there must be some helpful sleights of hand that allow for workarounds.

Richard

8.

I have some reservations about the white paper, but I can be positive about Greg's idea that things are what they are. Virtual worlds are systems of evolution and change and I think they are best currently studied as that.

For instance, currently -- taking a game-oriented perspective -- I am very interested in how social value systems impact social system play and variation. Or, more generally, how social/individual *evolution* takes place within a code-oriented context.

In order not to be overly systemy and vague, take this as an example:

Day One... There are duplicate implementations of the same mmo pvp rules across several servers. These rules are reasonably complex, meaning that successful competitive strategies are unclear (i. e. no one knows what the fotms are).

Day One + N... Successful pvp strategies have evolved. Fotms rule the roost.

Are the "solutions" to the problems posed by the game code arrived at in a similar way, with a similar result? To what extent are these solutions predictable from the game code? To what extent are these solutions aided/inhibited/avoided by social groups and groupings?

In this instance, mmo servers with the same rules set seem sort of like little Galapagos islands: very interesting in comparison to each other.

And yes, virtual worlds may also be interesting in comparison to real worlds insofar as real worlds are equally code-oriented. I tend to think that real worlds are equally code-oriented. But then, I often think, that's just me.

9.

I agree with Greg and with dmyers, and wonder to what extent the continued efforts to push for this kind of use of virtual worlds is more a reflection of the way we dismiss exploratory, critical research as part of science (and of empirical inquiry in the broad sense) than anything else. Another way to put that is that some of the general concerns with this kind of approach lead to the kinds of specific objections raised by Tim, Gordon, and me in the comments section of this thread.

10.

Thanks for the comments so far. Very interesting stuff!

Several people have noted ways in which this method could go horribly wrong. I suspect that if they read the paper, they'd see me agreeing in some detail. The idea isn't that this approach is always valid or wonderful, but that there might be uses for it when testing some things. This is a road map to figure out which, if any, of those things exist. There may be zero, but early findings highlighted in the paper suggest it is probably not zero.

Richard and Morgan raise the excellent concept of test-retest consistency. I'd totally forgotten about that. It's called reliability in the social sciences, and it's the extent to which a measure, when repeated, gets the same thing. Testing one server against another is indeed the ideal way to check reliability, and I am going to include that concept with an acknowledgment. (Ted and I have a paper under review in which we do this very thing and find that results from one server are pretty directly copied when a second, parallel server is launched, i.e. the second quickly approaches and then matches metrics of the first.)

Greg, you might be right and this might not be worth doing. It's messy, potentially very expensive, and fraught with tons of validity issues. On the other hand, it offers the possibility to test things that are tough to test offline. For example, wouldn't it be nice to know what the likely impacts of a massive financial bailout might be before spending $700 billion on it?

As to being objects of study worth looking at on their own merits, I agree. I don't see this as mutually exclusive, and that's page 1 of the white paper. Whether they are moratoriums for regular behavior or not is the gist of all of the proposed validity testing.

dmeyers is a structuralist at heart like me, and is risking the wrath I normally feel :) The little Galapagos idea is the basis of the test-retest function.

Last, a note on methods. Andy and Morgan point out ways that scientists can screw up, either by missing the bigger picture or by doing bad sampling. Establishing mapping will by no means solve bad methods.

@Thomas. I'm working on a new approach, not dismissing anyone else's.

But since you raise this whenever I post findings, let me say something. When I present my findings at conferences and workshops the only dismissals going on are coming from the critical cultural people in the audience. They say that my methods are inherently flawed, and that the human experience is too complex to model or measure. They say that inferential statistics are flawed. I have heard them. I have thought about what they've said. I respectfully disagree with them. Many are polite, and some are not. I find many to be hypocritical, talking about the importance of openness and multi-methodological work while taking cheap shots at those of us trying new things and trying to combine methods.

Those folks who find my use of experiments and surveys to be hopelessly misguided are welcome to ignore them. Let's move on. Personally, I will continue to tout the benefits of multi-methodological work. If someone were to read this paper, they'd see that it is agnostic about methods. It only assumes empiricism. Ironically, the only call for methods in there is me stressing the importance of anthropological-style participant observation.

11.

As I have indicated on numerous occasions, my concern is not with the methods of experimentation and surveys. They are an important part of our methodological toolbox.

12.

@ Dimitri

This is a great start at laying some methodological foundations for this area. However, I would have liked to have seen more in it on directionality. You only very briefly mention the other areas at the bottom of page 20, i.e. whether the virtual impacts the real (which includes the normative effect that Ted referred to in Exodus), as well whether the VW and RW might end up "reinforcing" each other as you put it, (which is actually a form of convergence that would produce a new hybrid form). I think that these issues are a critical part of the methodology, in the sense that one should account for any possible contamination into or out of the petri dish during the experiment (recognizing that such contamination may be an interesting subject of study in itself).

Peter

13.

Dmitri wrote:

Testing one server against another is indeed the ideal way to check reliability, and I am going to include that concept with an acknowledgment. (Ted and I have a paper under review in which we do this very thing and find that results from one server are pretty directly copied when a second, parallel server is launched, i.e. the second quickly approaches and then matches metrics of the first.)

I would be curious to know how you make sure the repeated tests are independent of each other? At least in the games I am aware of, discoveries, strategies and memes developed by players on one server quickly propagate to other servers via forums, multi-server players, YouTube videos and other channels. Is it therefore really possible to treat the servers Galapagos islands? In my opinion not. I think this was the main shortcoming of Ted's original 2006 paper ("On the Research Value of Large Games") as well.

14.

I think that's an empirical issue, Vili. If you are testing for some outcome and you find it matches from server to server, then there's obviously some reliability there. If you don't, that suggests an independence between servers and a lack of reliability. I could imagine, for example, that one server might take on a harsher culture than another, which might impact, say, overall trust patterns. Or maybe it doesn't. Empirical issue.

The finding I alluded to above was economic. We found that patterns from one server were directly replicated by another one coming online from scratch. And they weren't even the same rule set (one was RP, the other PvP). That suggests to me that there is reliability, but note my qualifications: on that aggregate measure, and between PvP and RP servers, and of that game, if possibly others with similar structures.

@Peter. Point well taken. My personal feeling on this is that the online/offline systems are highly endogenous. I'm trying to leave room for others who may disagree and see clear directionality.

15.

Dmitri, thanks for your response. I'm still not sure if I get it, though, so please allow me to continue for a bit..

I think that's an empirical issue, Vili. If you are testing for some outcome and you find it matches from server to server, then there's obviously some reliability there.

Not sure if I agree with this. There are two possible explanations for such a finding: 1) the experiment is reliable, meaning that it produces the same result every time it is re-run; or 2) you are not actually re-administering the test, but measuring the same result again and again, because the servers are not independent re-runs (not "Galapagos islands" or "Petri dishes").

The finding I alluded to above was economic. We found that patterns from one server were directly replicated by another one coming online from scratch.

So the question is to what extent do servers really come online "from scratch", as I questioned above.

And they weren't even the same rule set (one was RP, the other PvP). That suggests to me that there is reliability, but note my qualifications: on that aggregate measure, and between PvP and RP servers, and of that game, if possibly others with similar structures.

I think we agree to some extent here. I don't want to go so far as to claim that all attempts to use servers as parallel tests are tainted by server cross-pollination. If you can can show convincingly enough that for the purposes of the questions you are examining, servers behave like independent experiments, then that's great. That was not attempted in Ted's 2006 paper, though, which dealt with coordination games. He observed that players in parallel servers of EverQuest and Dark Age of Camelot ended up converging in the same meeting places. This might well be evidence in support of the theory of coordination games operating in large groups of people. But in theory the players could have e.g. agreed on the meeting spots on the forums (a slightly more plausible explanation might also involve multi-server guilds/players).

16.

It's an interesting point, Vili--the idea that because there is some larger context, there may be no totally independent draws from server to server. Certainly there are things they have in common which makes them parallel yet not isolated.

Isn't that a good thing so long as it's consistent, though? That's the idea of "ceteris paribus," unless I'm missing your premise. They can't be perfectly independent draws because they all have the same manuals, the same marketing and the same support websites, yet these are the things that also make them otherwise equal.

I was trying to give an analogy for this earlier today. I imagined what it would be like if we could make a perfect copy of the USA, only empty, and then populate it with a sample of people who were indistinguishable from the current residents. And then we'd run tests on it to see what would happen in the "original" USA. (We'd have to have a mythical space off to the side with lots of batches of people ready to go, all with the same cultural and historical backgrounds)

17.

@ Dmitri - evidently then the only method of producing a completely uncontaminated Petri dish or Galapagos Island server would be to run Bostrom's "ancestor simulations" either in parallel or a series, which leads to extremely interesting additional mapping issues, e.g. the Borges map that is the size of the Empire that it represents.

18.

I've used this a few times with methods classes, but it always reminds me of comedian Stephen Wright's bit where he says "I keep a map of the United States at home . . . actual size. At the bottom it says 'one mile equals one mile.'"

19.

Once again:

Galapagos Island analogy only because a common set of evolutionary principles exist on and off island. These principles become more obvious and more pronounced, perhaps, on island. Yet it is their *similarity* with the off-island principles that is most illuminating and most remarkable.

If we plant several oak seeds in several pots, and each grows slightly differently, tall and short, many limbs and few, is it most remarkable that they grow differently? Or is it most remarkable that they all, regardless of the pot, somehow magically turn out to be oak trees?

I vote for number two.


20.

@ Dmitri

And, of course, Wright's punchline - "Last summer I tried to fold it up."

21.

Those folks who find my use of experiments and surveys to be hopelessly misguided are welcome to ignore them. Let's move on. Personally, I will continue to tout the benefits of multi-methodological work. If someone were to read this paper, they'd see that it is agnostic about methods. It only assumes empiricism. Ironically, the only call for methods in there is me stressing the importance of anthropological-style participant observation.

The comments to this entry are closed.