Guild Wars 2 studio ArenaNet has been the subject of many side-eyes over the last few years, amassing a reputation as a studio with a messy and opaque technical side and a messy and opaque leadership side – and the mass layoffs in 2019 didn’t help. Earlier this month, however, the company finally publicly admitted who’s been running the studio and Guild Wars 2 since Mike O’Brien left and Mike Zadorojny stealth-departed at the end of 2019. And now, we’ve gotten an unusually detailed technical dev blog from Platform Team Robert Neckorcuk, giving us even more hope that ArenaNet’s revitalization includes a reset for its public image and transparency.EU megaserver was rolled back and cost players a significant amount of cash and gameplay time, which for Guild Wars 2 was an exceedingly rare incident. ArenaNet then moved to compensate players, but that was a bit of a bungle, as players complained about the compensation gifts and the mixed incentives.
Neckorcuk digs much more deeply into what happened behind the scenes over those 20 hours, starting with the update the week before, a rogue database issue, and (no kidding) the drivers that all helped cause and compound the cascading disaster. He also details the whole process of how the studio identifies problems, solves them, gets the hamsters running again, and prevents the issues in the future. And you might have noticed, that kind of downtime hasn’t happened again.
“The most impactful change for our databases was to increase alerting on key database metrics, not just system metrics like CPU or hard drive space. For our live operations, we added a number of alerts into a third-party tool to improve our response time for future issues. And for general operations, we’ve improved the record-keeping of our AWS infrastructure, now tracking more than just the instance type. Our reports now include instance types, generation, drivers, and storage types. We built a common package to install on all new servers that includes specific driver versions. Any future migration plans will update this common package, ensuring that we don’t repeat this issue again. We have completed the migration for all the remaining database instances and more, providing higher performance for improved service. In the last fourteen months, we’ve recorded an uptime of 99.98%, with only five minor service interruptions impacting user log-ins.”
If this is the new ArenaNet… we like it.