In Valid Logic

Endlessly expanding technology

Bookmarks as a web archive

Bookmarking has been extended in several respects with sites such as, which introduces a social aspect into sharing and discovering bookmarks from other users, and then you have your "personal bookmarks", the ones you find under the "Bookmarks" menu item of your favorite browser.

Traditionally, browser bookmarks have always served two purposes: quick access to frequent sites/pages, saving or archiving previously "found" material.

The access to frequently visited sites is one that has always been useful, and will continue to be. Even with the usefulness of auto-complete functionality when typing an address into the address bar, having a quick links browser bar, or list of sites quickly accessible with a few clicks is sometimes still easier. Perhaps you like getting to the Business section of your local newspaper without having to type it in or click through a few pages to get there. You can easily link right there.

The area where I feel that bookmarks fail me is with the archiving of pages I'd previously found and saved as reference type material. In reality, this is what a lot of my personal bookmarks are, especially the ones I don't have neatly organized at the top of the list, or in little folders. I might find some nice, in depth blog post I want to read, but don't have the time to right now. Or find an article on some coding ideology, or a list of network ports common servers use, or some process for working with photos for my photography hobby. These are things that I want to come back to later and keep around as things I might use at some point. There are several flaws with this system though.

First, they are not permanent. These links are often individual pages buried within a site, not the front page or a prominent section. Because of this, they can fall victim to the evolution of the site. Maybe the owner redesigns there site, causing the location to change. Your bookmark is now invalid and you might have to find the new location. Maybe they decide to start fresh, changing blog software of something and not importing old posts. Your content is now gone. Or maybe the sites goes away, domain expires or something, then the page is lost forever as well, and you don't even have a way to now email the author to see if they have it saved somewhere. I used to have a list of several bookmarks to different techniques for making black and white photos. Unfortunately, when I went back to some of them, the links no longer worked.

Second, they are not searchable. If your bookmarks are searchable, it is likely only searchable by the title or URL. Perhaps you remember reading some stat or article and only remember a phrase from it, or something similar. You have no way to quickly find where that was from unless you visit each of your bookmarks and search the page (assuming the links are still valid).

And thirdly, the organization of bookmarks doesn't scale. Most people quickly bookmark a link and move on, so when the thought does come to them about organizing them, they now have a nice pile of unorganized stuff. Sometimes, organizing never gets done because of the size of the task keeps growing, so getting it done becomes a dreaded task. Most browsers just have basic folder system for "categorizing" bookmarks, but this often isn't descriptive enough and can be vague. You might have to develop your own folder structure to nicely describe links.

There are different strategies that try to remedy some of these. For instance, many people pipe their saved links through, since by going that route, you can tag your posts (which is a great way to solve #3), but can fail elsewhere. You can enter a description, but then you can only search that description, not the full-text of the page, so #2 is only partially resolved. And it is of course still victim to #1.

Backpack is extremely useful, but it is primarily plain text. You cannot cut & paste, or easily preserve, the full rich-text feel of the page. Preserving images is an absolute must. For my photography tips I mentioned earlier, photos of the steps would definitely need to be kept.

OneNote is an excellent program, but it too tends to lack in the rich-text department. You can send documents to it, but most of the time it will essentially go through a virtual printer. Print format doesn't always look as good as the original. Quite often, I hate the printed format of a webpage. It usually isn't as pleasant to read or as friendly. Also, no Mac version.

Then there are several journal/library type applications for both PC/Mac other than OneNote, but most are focused around note taking and keeping snippets, but it is kind of like none were designed for the specific purpose of archiving a webpage. Those that support rich-text formats would likely handle this in kind of a cut & paste fashion, but then you still tend to lose formatting.

What is really needed is a stronger archiving/library type program/site that allows you to fully preserve what you saw the day you first visited the site. The one application I've found that lets me do pretty much everything I'm looking for is DevonThink by Devon Technologies. It has an AppleScript plug in that allows you to send a webpage you are currently viewing to it as a "web archive", in which it will basically "slurp" the entire page and preserve it within its database. Then you can come back and view it as if you were looking at the site itself. It is rather pricey though.

I am wondering what else is out there though. Anyone else know of something that could achieve this? Perhaps an online application? I love the idea of Backpack, where it is essentially cross platform and available anywhere, but I just haven't been able to find anything that gives that "slurp" type functionality to preserving a page. Part of it might be copyright issues... IE, one thing to save a page yourself for personal use, though a website might have issues with them saving it for someone else. Not sure, I'm not a lawyer (thank god). Anyone have any suggestions?

Tuesday, January 15, 2008

blog comments powered by Disqus