Librarians and archiving the web
What expectation of permanence do we have of the web? Trick question, there is no standard expectation. Digital archivists—librarians—are best suited to serve as custodians of the web.
I’ve been thinking about this issue recently as Dave has been blogging about his effort to get the blogs.harvard.edu domain protection from the forces of entropy. Harvard, however, has other ideas. From the FAQ:
At this point, for all of the reasons set out above, we feel that the time for hosting content from non-Harvard-affiliated bloggers on Harvard servers has passed. We are giving non-Harvard users with active blogs the opportunity to export existing content over the coming weeks. Those users will then be transitioned off the platform.
The situation suggests a microcosm of the larger publisher/platform question – what responsibility do blog platform have for the contents? And furthermore, if there is no violation, should there be an expectation of permanence?In the case of the Harvard-affiliated content, where there does not seem to be any violation, the loss is even greater, as the content trove represents a historically significant body of work from the early web period.
In the case of the Harvard-affiliated content, where there does not seem to be any violation, the loss is even greater, as the content trove represents a historically significant body of work from the early web period.
For nearly a decade, the Library of Congress has been investing in public resources for digital archivists. The skills required seem forensic, indeed it is a technically challenging ordeal to keep entropy-prone content alive at the same address (and protocol!). Not to mention, the myriad ethical questions involved with archiving different content.
Archive.org is another leading example of a library investing in digital archives. With 30+ petabytes of duplicated data, the group has helped a number of institutions port their content into special collections.
Funny thing is, if you asked the librarians at Harvard (or any other university), they’d probably agree whole-heartedly about the need for their profession to be at the center of protecting the web. As with any institution of sufficient scale, focus and coordination most likely limit internal efforts to solve the problem, and as the Harvard press releases indicate, the decision seems to have been made already.
But until that server is wiped, there’s still time to protect the historical contents of the Blogs at Harvard platform, and the school’s own librarians are best positioned to do it.
Here are some examples of posts on the Harvard site that might be lost:
- What makes a weblog a weblog?
- A God for Bloggers
- My cousin seems to have a blog as well (with ads)
- Goodbye, World!
Follow @DaveWiner for updates, he’ll be more on top of this than me.
Your ad blocker is on.
Read ad free.
Purchase a Subscription!
Here is a copy of my presentation and prepared remarks from WordCamp for Publishers 2019 in Columbus.
Old but new to me.
Send this to a friend