- Official Post
Hello everybody,
I'm typing this message after 19 hours, 19 minutes and 9 seconds of downtime on Guild Wars Legacy (we've got monitors for that information, after all).
The downtime was caused by issues between our server and the network uplink provider, which was far out of our control and unfortunately, we were forced to wait until they resolved this issue.
Some other issues forced us to restore Legacy to the latest available backup - we have lost a tiny bit of content, but due to our backup system almost everything was left here.
Update: I have been informed by our service provider that it was, in fact, not an issue with the network uplink provider, but the physical hypervisor which housed our main Legacy web VM has had major issues and this led to data loss - the VM got corrupted and could not be restored.
Fortunately, I might have overdone it on the backup front and was able to restore GW:Legacy with barely any data loss.
When the downtime started, we had no idea how long it would be and if we could retain the existing content that was not yet available in our backups - this led to us not spinning up Legacy on another server as we wanted to avoid having a Legacy server that was out of sync - we'd lose posts in either way.
This incident was a good pointer for us and tested a lot of the systems that we have in place when Guild Wars Legacy has issues - we have an amazing backup system, our emergency staff chatroom was online and was actively used, our monitoring system notified us about the issue within seconds of it starting. Other than cosyfiep being treated for withdrawal symptoms, everything is fine :).
Over the last few weeks we've been optimizing the Guild Wars Legacy infrastructure and we'll be taking further actions to avoid lengthy downtime.
The first step in this is to setup a secondary database server that will replicate the main database, so loss of posts is reduced to a minimum.
Next to that, we're going to drive up the database backups even further.
Sorry for this downtime - it was completely out of our hands, but we're taking steps to avoid further downtimes like this and will be handing over the responsible ones to Mad King Thorn.
- Kevin