How to archive an academic blog or website?

Digital preservation is an evolving area, and the ability to preserve websites over the long-term is one of the most problematic areas. There are several reasons for this: the dynamic nature of contemporary websites (especially in a format like a blog, where content is update regularly and interactive components such as commenting), hyperlinking (which will eventually lead to broken links and broken images, which, if you point to content external to your site, is out of your control) and the instability of Web formats (websites might look better in one browser than another, much less how websites hold up over time) are just a few of the challenges in preserving websites.

There are, however, several things you can do to help preserve your website:

  1. Backup—keep your content stored in more than one location (on your server, on your hard drive, on an external hard drive, in the cloud, etc.) and don't use an external service as your primary storage location, as they can collapse at any time (Geocities was a huge service when it was shuttered);
  2. Keep up to date in changes in file formats and browser software—when browsers get updated to view HTML 5, 6, or whatever may replace HTML in the future, will they be backwards compatible to be able to view the blog you're authoring today, or will you need to migrate your website to a contemporary format?;
  3. Utilize pockets of expertise on campus—archivists on many campuses have been working towards digital preservation solution and your university's archivist (especially at larger universities) may be able to provide solutions for the long-term preservation of your website; and
  4. Consider normalizing your website to a preservation format—although creating a copy of your website as a PDF may mean losses in functionality (in terms of interactivity), it is a way to preserve the content and appearance of the site in a file format that is considered a safer bet for long-term preservation.

The Library of Congress also provides some tips on how to design preservable websites including following available web and accessibility standards, embedding metadata and maintaining stable URLs.

Taylor, N. (2012, February 6). Designing Preservable Websites, Redux. Retrieved May 23, 2012, from http://blogs.loc.gov/digitalpreservation/2012/02/designing-preservable-websites-redux/


  1. keep up-to-date backups so that you've got copies in at least two geographical locations (e.g. one at home, one at work) of everything you want to keep.
  2. route everything via your own personal domain: so that even when things are hosted elsewhere (current university website, pre-print archives, whatever), the URL people see and bookmark is the one on your own personal domain. That way, their bookmarks will still work when you change your affiliation away from your current university.
  3. pick good URIs, and then stick to them.

I believe you have a number of options, which I'm stating in no particular oder:

  1. You can use the Wayback Machine, which strives to store snapshots of web-pages across time. You can also use the more personalized archive-it service, which lets you manage your own collection, at the same time sharing it with the public - this is mostly used by institutions I think.

  2. Alternatively, you can host your own blog on a licensed domain name, where you've prepaid the fees for enough years in advance.

Lastly, a very simple suggestion - if the content of the website is tending towards the academic quality/nature of a book, why not publish it as a short collection essays, which you can then disseminate freely over open web libraries etc?

Tags:

Website