SOE's EQ2 suffered a widespread outage after a server reboot (thanks MMORPGDot), documented in hourly updates here. From tales like this, we are of course reminded of how our worlds, our actions in those worlds, and even our avatar's memories (persistent footprint) are tethered to technology and real world actions - "roll backs" and all. The good news, though, is that on the tomorrow, when your toons get out of bed, they won't remember a thing.
I suppose for some thing that happen in games like EQ are as good as real memories. When you mentioned not remembering a thing after a rollback, I immediately thought of Cory Doctorow's "Down and Out in the Magic Kingdom". Next time downtime like this happens, http://craphound.com/down/ is recommended reading :)
Posted by: Ivan Tumanov | Dec 20, 2004 at 02:55
Do have to say that SOE did an excellent job on public relations.
When it looked like it was going to be more then a quick restart of the servers they gave constant updates, usally hourly. They also came out and explained reemburesments to the players; 3 days added to account, and from Sunday morning to Tuesday morning characters get a 50% bonus. When servers did start to come up they gave info on that server's boards that they were about to be started.
From a techie standpoint I would of liked a little more info on what happen(like that will ever happen) then that it was a hardware and software problem and that they had to fly in people from the hardware manufacturer.
Posted by: will dieterich | Dec 20, 2004 at 03:11
As someone who has worked on the back-end of MMORPGs as an ops person.. I find it exceedingly difficult to imagine what sort of hardware problem EQ2 could have faced that would have resulted in this much downtime.
EQ2 was recently designed, and certainly designed with redundancy and hardware failure in mind. It's not like the money press that Sony has is bottlenecked by financial resources.. maybe it's a failure of imagination but blaming this on hardware failure doesn't seem plausible to me.
=darwin
Posted by: Darwin | Dec 20, 2004 at 15:18
As someone who's worked for a MMOG service as an ops person, I find it exceedingly difficult to imagine what sort of ops engineer you are if you can't spare some sympathy for that team.
Allow me to craft a worst case scenario for you:
Authentication server, after a reboot, fails to start. Perhaps load balancing for the auth servers is done through partitioning and a chunk of the user base is no longer able to get online. Perhaps it's at this point that a back end database is realized to be the culprit. Perhaps at this point, the database's integrity is suddenly suspect. Perhaps at this point, the DB's back up integrity is no longer assured. Perhaps at this point, the SAN that's hosting the DB is failing. Perhaps at this point, we realize that your MMORPG backend system was for some unplayed and un-taxed system like 'There', and you really have wasted everyone's time with your completely unqualified opinion about one of the most popular MMORPG's available.
Just because you've had experience in the same genre does not mean it's the same class of service or demand. These datacenters are complicated and seemingly simple problems can have complex sources.
Posted by: Captain McCrank | Dec 21, 2004 at 03:13
Interesting posts. I was one of the players on the sidelines and watched this event with great interest.
There were two separate aspects of this event. The first was a hardware failure that led to some anomalous behavior with characters, skills, and items being lost. The servers were shut down and little info was put forth to the players other than there was no ETA and they were working on the problem.
The second aspect was how the folks at Sony handled the customer service issues.
True, at first there were hourly updates. Then after a period of time, I think 12 hours.. nothing. This lack of communication fed a near hysteria of the player base. The chat room was locked so players could not talk to each other about what was happening. The forums were so heavily used that at times it was impossible to even log on. Overall information back to the players was sporadic and not official. In other words, the info was passed from game masters to some players then out to the player base.
Now, from my perspective there are often issues with technology and sometimes things happen that are beyond control and foreknowledge. Sony should be commended on taking the right action by shutting the servers down, and fixing the problem. The teams that worked on the problem should also be given kudos as this was undoubtedly a six-pack experience.
The lack of customer support was painfully evident however. I experienced this event as one of the most underscored lessons on “What not to do” in this sort of business. It would be interesting to see the subscription drop rate between December 17th and Jan 1. I bet it is fairly significant.
~jcl
Posted by: Jim Landes | Dec 21, 2004 at 19:05
Actually, they were posting hourly for at least 24 straight hours (until Blackguard left). True, most of the updates were "Sorry, but we don't have an ETA yet," but I am still quite happy that we even got that. They also handled it well by giving everyone 3 extra days of free playtime and 2 days of bonus experience.
It seems they increased experience gain again today until Monday as a holiday gift, which is pretty cool. I dealt with a GM as well, and they were actually helpful (surprisingly)!
Posted by: Luckbad | Dec 23, 2004 at 16:08