It only seems like yesterday that I penned my 4th birthday post for Guild Wars Legacy - another year has flown by.
I'm not going to lie, it's been a completely different year opposed to what we all hoped and planned that it would be. Corona ruled and still does rule our lives.
However, the future seems more bright than it was months ago. The same goes for Guild Wars Legacy - before you can look into the future, you will need to have an eye for the past.
So, that is what I have been working on for the last months.
The origin of Guild Wars Legacy
Most of you will have been around at the time that Guild Wars Legacy was started, but for those who weren't, or those who don't know the story, I'm going to tell the story behind Legacy.
It all started 5 years ago, when it was announced that Guild Wars Guru, the biggest GW1 fansite ever to have existed, was going to shut down and that a Guild Wars 1-section would be created on Guild Wars 2 Guru to replace it.
There, however, was no intent to convert existing posts to Guild Wars 2 Guru and almost everything that was going to be taken over, was going to be copy+pasted over. Not an ideal situation.
This also forced us onto another website managed by Curse and we did not have any guarantees if Guild Wars 2 Guru was going to stay online forever - it wasn't as popular as Guild Wars Guru ever was and was in the shadow of the original Guru.
Due to this, the community decided to stand up against this and worked on creating their own alternative to Guild Wars 2 Guru, something that would be run by fans for the fans. Without an eye for the money behind it, just to provide a home for the Guild Wars community on the internet.
Thus, Guild Wars Legacy was born and it has now grown to what it is today - something that we all are immensely proud of.
In retrospect, the decision to start Legacy has been shown to be a good one: Guild Wars 2 Guru shut down in 2017 and Curse is no longer around.
Guild Wars Legacy is still around and is still growing - Guild Wars 1 is far from dead, indeed!
Preserving the past
However, Guild Wars Guru was a huge source of information and with it going offline (and the promised archive of it, never coming to fruition) it all was threatened to get lost.
But, that was when several users stepped up, including Smoke Nightvogue, who downloaded a copy of the Guild Wars Guru website.
Those files were delivered to us and ever since we have hosted the Guild Wars Guru archives on https://archive.guildwarslegacy.com.
However, this was a static HTML version of Guru and it was huge - clocking in at 100 GB - and it is hardly a great experience - pagination doesn't work, links are almost always dead (even to other pages), you can't search and navigating through it is a hell.
This is an issue that I have known for many years, but it is a hard one to solve - you can't easily make a 100 GB data set searchable and fix pagination in thousands upon thousands of files.
So, another solution had to be found. And I've found it.
Minifying the archive
The first thing I tried to get the archive in a better shape, was to minify it - this is possible by stripping out multiple lines of code that occur on every page (there is a snippet of 1,000 lines of in-line CSS code) and replace it with an include.
Parts of the site, like the login forms and search forms were broken and could be stripped out.
I tried doing that, but that took ages (9 hours for a simple replace, for a few lines) and automating that also proved difficult.
I wrote a blog about how I tried to tackle this on my blog: https://ke.vinpet.it/blog/minifying…s-guru-archive/
I tried and I tried, and I got it down significantly. But I wasn't happy yet - it was not searchable and it was not easily browseable either.
So I thought, how can I solve this? Perhaps the solution doesn't lie in removing what isn't needed, but rather, storing what is needed.
Building a scraper
So, I went on and started working on a scraper - the GuruArchiver Scraper. I built this in Golang (the source code will be available soon) and it loops through all the files in the archive and it extracts all the user information, the categories, threads and posts.
I wrote a scraper for this job and with a few modifications here and there, it succeeded in scraping the entire Guru-archive. It took hours to get through it, and I'm not 100% completely happy with it, but I've ended up with a staggering amount of posts, categories, users and threads:
- 41 categories
- 78,218 users
- 561,723 threads
- 4,911,097 posts
During the scraping, I occasionally ran into some troubles but with some small tweaks, the scraper worked remarkably well.
Since writing such a tool isn't as simple as it sounds (especially since you want it to be performant and not ruin your disks in the progress), and with such great data sets, efficiency is very important.
So how did I write it? Well, another article will be coming up soon to tell you about how I did this, alongside the full source-code of the tool.
Now, this staggering amount of data has been stored in a MariaDB database, which means I was now able to do some fun things with it... that's right, something véry cool is coming soon!
Presenting the new Guild Wars Guru Archives
I had hoped to be able to launch it today, but unfortunately, it simply isn't ready yet. It's been a huge undertaking, and running the scraper took multiple weeks of my time.
And I've been very busy as well. But, I have some cool screenshots of the work-in-progress archive that I can't wait to share with you all!
Keep in mind that this is extremely early work in progress - the styling is far from off, entire chunks of code are missing (that is why you can't see the user-information yet, only the user ID), pagination isn't introduced yet... there's quite a lot of work still to be done.
But, I couldn't wait to share this with all of you and, as a birthday gift from me to you all, here are the first screenshots of the in-works Guild Wars Guru Archive:
These are, in the order of them showing up: the homepage category view, the thread listing, the first thread on Guild Wars Guru, a random post about April Fools and the thread saying goodbye to Guild Wars Guru (which was also where Guild Wars Legacy was first announced).
As you can see, it's far from finished - it's still quite rough around the edges, needs a ton more code and a lot more design work. But, I'm working on it.
So, tell me? Are you excited about this news? Do you look forward to using the new Guru Archives?
Feel free to let me know!
If you want me to post more about these kinds of things, let me know as well - and finally, if you want me to be able to code more things like this, feel free to support Legacy - it enables me to spend more time on Legacy and thus, improve it for all of us.
- On to further adventures, and even more years of Legacy!
Iaerah (formerly known as Kevin)
About the Author
Hello, I'm Iaerah (formerly known as Kevin - which is my real name), the Guild Wars Legacy admin.
The reason why I switched my name to Iaerah is mainly because Kevin is so generic, and if I ever want an NPC named after me in game, I'll need a more creative name
Joking aside, Iaerah is also my main in Guild Wars 2 (I know, I know, shoot me - but I enjoy both games. Just don't see GW2 as the sequel to GW1).
I'm not only the Guild Wars Legacy administrator but also the founder of it, together with some other great people like Richey (who runs the Guild Wars: A New Hope Facebook group) and Max Borken. I'm quite easy to contact and generally spoken I reply quite quickly, but I have a tendency to read my notifications and forgetting to respond. If that happens, feel free to send me a reminder.
You can contact me using the contact form on this site, you can send me a PM here or you can mail me on my Guild Wars Legacy email (hint: it's my real name @guildwarslegacy.com) and I'll get back to you as soon as I can (if I don't forget ;)).
In game, you can contact me on one of my two main accounts: my mains are either Leanna Goldwing or Inquisitor Karinda.
In general, don't be afraid to contact me!
Do you have an idea to improve Guild Wars Legacy, no matter how small it might be? Feel free to let me know!
Comments 6