We live in a world where the technology exists that the government or other technically sophisticated group is able to monitor and analyze a substantial fraction of the communications of the world's population, or can track their movements throughout the day, or keep tabs on their financial transactions.
And that world is called World of Warcraft.
While the NSA has been capturing and analyzing international phone calls and electronic communications, with far less press coverage I’ve spent much of the last year collecting and helping to analyze data scraped from World of Warcraft as part of the largest quantitative study of virtual worlds to date.
We've run into a series of problems trying to scrape information from five of WoW's servers -- some expected, some not -- and developed some rules of thumb in the process.
A Brief History of Peeping
World of Warcraft is not PlayOn’s first entry into listening in on a virtual world. Early in 2004, Nic Ducheneaut and Bob Moore placed bots in the cantinas and starports of one server for Star Wars Galaxies, and collected the chat logs from those environments. My own descent into PlayOn was to help analyze the gigabyte of chat logs that had been collected.
Coming to WoW, we realized that Blizzard had opened up the game programming interface so that most of a player’s in-game actions can be automated, but still preventing game-playing bots. The /who command, combined with enough patience and a small matter of programming, enables a bot which can take a census of the entire logged-in world. Worlds, in fact, one faction at a time.
Starting last April with a census bot, we’ve collected 190 million sightings of the form
Magtheridon,01/01/06 21:16:51,Deputura,59,Un,st,y,Dire Maul,
Magtheridon,01/01/06 21:16:51,Onlysurface,28,Ta,er,y,Warsong Gulch,TerrifyingPulsar
(That is, a 59 Undead Priest and a 28 Tauren Hunter.) Most of the analysis on the PlayOn blog, and the data in Nic’s earlier post is derived from the simple information kinds of information shown above. Since then we’ve added scrapers for gender, zone chat, guild rank, and pvp rank, (while failing repeatedly to add a scraper for economic data), chugging away on 6 dilapidated PCs spanning 5.5 WoW realms. And we learned a lot about scraping virtual worlds over this period.
Things Take Longer and Cost More
Virtual worlds conspire against the would-be analyst, so that the “small matter of programming” mentioned earlier became a burgeoning monstrosity. The SMOP that is needed to test whether we can extract some type of information from the virtual world is usually not enough to function 24/7 in the face of different network or VW conditions. A lot of our code isn’t so much concerned with scraping data, but determining if the server is up, or if a request is taking too long, or if the addon is possibly wedged.
Our scripts are very bad at dealing with situations humans can account for. Resilient software in the presence of connectivity and game issues is hard. Good grief, we have software fragments which try to dynamically optimize the amount of time to wait between receiving /who results and sending the next request.
We deal with a semi-supported part of the game. The best documentation for the WoW API has been created by the modding community, but is still spotty in places, and wrong in others. Someone writing scraping code can easily become the expert -- outside Blizzard -- on some arcane portion of the game.
By far the most difficult problem we’ve faced is that the game itself is not static, but presents a moving target. Even our interest in and understanding of certain data changes over time. Patch day is a mad scramble to get the scraping software working again. Maybe the login code changed. Or, for example in the most recent release, information on whether a character was grouped or not disappeared from the API, breaking our software in the process.
Several of these problems leak over from the data scraping into the analysis. It’s simply a fact of life that we have holes in our data. Any analysis we set up has to deal with a day or week missing here, or a few hours missing there. It’s surprising how much this can complicate matters. And however it came about, analysis parsers must deal with slightly different data from different eras.
We can only analyze what is available to the players. As a result, we often have to make a trade-off to find a reasonable proxy. For example, we would like to know when people are grouped to quest together. The best we’ve been able to do to estimate this is to identify guildmates in the same zone, both in some group. We realize that it’s not accurate, but we have to make do. More troublesome are situations where we have no good proxy. We can develop predictors of character abandonment, but we have no way to know if the player switched to another character, or another game.
Tips for the Impenitent
If my confession has not dissuaded you from scraping and analyzing intelligence from virtual world, here are some suggestions for help:
- Expect to spend a lot of the software development time handling exceptional cases like servers down, lag, and time-outs.
- For something that runs repeatedly, try very hard to find a way to have another process detect when the scraper is wedged, in order to exit and restart.
- Be patient. Set your timeouts, as much as possible, for the worst times of day, even if they run slower than necessary during other times.
- As a new scraper comes online, start doing some analysis of the data early, even though you may need weeks of data until the results are truly relevant. By beginning the analysis early, you will expose problems in the data collection that can be corrected.
- Files containing lines of comma-separated values (.csv files) are easy to output as text, and can be easily read by spreadsheet programs. Including columnar headings or scraper version info at the top can allow future changes to the scrapers with minimal change to the analysis parsers.
- Convince Nick Yee that he wants to help analyze your data. This is important.
- Generate log files to record significant events in the scrapers, such as logging on or off, dealing with time-outs, reaching major milestones, etc. These can be invaluable for tracking long-term issues.
- If you have log files with sufficient information, consider writing a monitor script to send you email when your scrapers have been offline for too long.
- Iterate between your game intuition and your analysis of the data to determine what to analyze next, what changes to make to the scraper, and what information you can proxy for information not yet available.
- Have fun. It’s a brave new world.
Wow...this sounds like an amazing project, but at the same time, it also sounds extremely difficult to keep your scraping software up to date *and* efficient upon patches. I know the mod community struggles wit this, and sometimes a 'clunky' line of code to get a mod out the door after one patch leads to lines and lines of spaghetti code 3 patches later.
Three cheers for pushing the boundaries and utilizing WoW's APIs for relevant data collection...hopefully they won't put a ban on the user created addons anytime soon that will make this sort of thing prohibitive in the future.
Posted by: Bart | Mar 15, 2006 at 13:57
This reminds me... haven't players made a number of individual-use scrapers for economic data, specifically Auction House prices? They certainly don't capture all the information about a transaction (notably, you can't capture the actual price something sold for!) but they do scrape a lot of information about how much various kinds of items are put on the market for.
Posted by: Naomi Clark | Mar 15, 2006 at 14:30
hopefully they won't put a ban on the user created addons anytime soon that will make this sort of thing prohibitive in the future
I think the problem is if these types of addons become popular with the general public for some use they will be putting extra strain on the game databases and increasing bandwith usage. The auction house scanners are one example that is already popular.
Posted by: Pendan | Mar 15, 2006 at 18:33
So, I know this may sound silly, but why not just put together a research proposal and pitch it to Blizzard? Then you don't have to write silly scrapers... you just get the *real* data. Of course, you have to actually offer a compelling *reason* for your research that Blizzard will find of direct benefit to their business (That's always seems to be the hard part). I mean, wouldn't most of these queries take about 15 minutes if you had the actual system logs?
While I love the desire to understand online population dynamics and SNA (it was my business for a while), the research methodology seems a little weak, the census math missing, and the data collection is riddled with an appalling amount of noise.
Sorry to be party-pooper.
Posted by: Michael Steele | Mar 16, 2006 at 04:05
I guess my question is similar to Michael's above: To what extent has Blizzard directly been a help or hindrance? Have they even acknowledged your existence?
Posted by: Russ | Mar 16, 2006 at 08:30
Nice idea here!
Why don't you develop a CLIENT for your data collection stuff. The client would send message in a chat channel (or whisper) the info to your BOT.
You could trigger the client to send infos like grouping status, gold amount, PvP kills/death and stuff like that.
This would permits you to collect more data. But the problem with this system would be that it might be "abused" if someone hack into your client to change to data transmitted.
But this could be overcome by having a lot of client software running. Or you could have a characters whitelist from whom you accept data.
I think that your project would greatly benefits from this.
And I really don't think that Blizzard will ever let you tap into their data. That would be too "dangerous". And I personally think that Blizzard already have tools that analyse data. This data is probably leading them in their development decision. Which type of content to add?
Which zone/instance is less used?
Which spells/class is less used?
What's the inflation rate?
Boting analysis...
On and on...
Posted by: LEKO | Mar 16, 2006 at 09:43
thanks for this Eric... i'm intrigued by this stuff from a surveillance studies point of view actually. Data mining of this kind is an imperfect science at best (even when the pros do it) for many of the reasons you seem to have encountered. I've chatted with marketing folks who have similar experiences with mined data from consumer databases - moving targets, inconsisent data, constantly changing search parameters.
I have always wanted to challenge the ethical aspects of the PlayOn scrapers (do I really want to be scraped in WoW anymore than I want my buying patterns analyzed on Amazon?) but the process intrigues me enough to encourage the effort. Indeed on the ethical front - your efforts may lead to useful counter-surveillance tactics that players can make use of against the undoubtably more sophisticated surveillance apparatus that Blizzard maintains.
Of course I really love what you are doing but I am also dying to challenge the meaningfulness of this kind of data (in a friendly way) whether collected and acted upon by Blizzard or by PlayOn.
Perhaps its time for you guys to host a workshop at Parc.
Posted by: Bart Simon | Mar 16, 2006 at 10:52
It takes a philosopher of science to properly understand what an empiricist outlook constitutes. As a general rule of thumb, if your interpreting engine never stumbled upon unexpected or unprocessable data, then one should naturally be suspicious of those observations.
Posted by: genericdefect | Mar 16, 2006 at 15:22
Naomi: Yes, as a player, I'm a big fan of Auctioneer by Norganna (auctioneeraddon.com). In poking around, however, we discovered that it was not possible to bring up an auction window programmatically with addon code. To get auction house information, it's necessary to right-click on one of the NPC auctioneers first.
Michael: (Warning! I'm going to edge awfully close to blatant self-promotion here.) We're right with you. Working with server-side data trumps scraping. We have tried pitching a research proposal to Blizzard -- providing tools that would lead to direct dollar savings -- and ... let's just say that we're now pursuing opportunities with other companies. But, hey! If you know of any game companies willing to cut a deal (Parc is a for-profit corporation, after all) to get top-drawer understanding of their customers, please send them our way.
Russ: In our brief discussions with Blizzard, one person told us that a "number" of people there read the PlayOn blog.
Posted by: Eric Nickell | Mar 17, 2006 at 11:02
If you're looking for a new world to get info about, take a look at Guild Wars. I'm an alpha tester for them, and the devs are very friendly.
Posted by: | Mar 17, 2006 at 12:42
Interesting post, but it was the replies that really got me thinking. There is a big moral difference between observing people's actions and influencing them. If you mix them, you get something like ATITD's "Little Shop of Horrors", which I fled. If you are honestly trying to figure out how people in MMOG's behave, you must never, *ever*, participate in the game design.
Posted by: CherryBomb | Mar 18, 2006 at 22:47
Did the people scraped get to give their consent?
Posted by: Prokofy Neva | Mar 21, 2006 at 15:17
Hi ! Your site is very interesting. Thank you.
Posted by: Tifany | Apr 03, 2006 at 09:33
Hello ! This is very [url=http://www.google.com/bb497]good[/url] site !!
Posted by: Bill | Apr 03, 2006 at 09:52
Interesting project, too bad you chose such an uninteresting MMOG to study. Of all the MMOG's I've play W0W is most like a Disney Land ride. There is no real individuality possible. Whatever race, class and level you choose you will be doing exactly the same thing as everyone else that chose that race class and level. In short there is no Meta game to make you feel as though your avatar makes a difference in how the world behaves ... YAWN!!! IMHO you're monitoring something that lacks reason to be monitored. I'm sure Disney knows how many tickets they sell a day and how many people ride each ride but, besides Disney, who cares?
Posted by: Stuart Pedaso | Apr 17, 2006 at 09:36
I think the problem is if these types of addons become popular with the general public for some use they will be putting extra strain on the game databases and increasing bandwith usage. The auction house scanners are one example that is already popular.
Valentine's Day Gifts
Posted by: Valentine | Oct 21, 2006 at 06:51
I guess my question is similar to Michael's above: To what extent has Blizzard directly been a help or hindrance? Have they even acknowledged your existence?
Currency trading, forex market
Posted by: David Lawrence | Oct 21, 2006 at 06:54
Of course, you have to actually offer a compelling reason for your research that Blizzard will find of direct benefit to their business with no wow powerleveling (That's always seems to be the hard part). I mean, wouldn't most of these queries take about 15 minutes if you had the actual system logs? Get world of warcraft power leveling here!
Posted by: Gulbi | Apr 14, 2007 at 22:24