I have over 1000 primary sources in my Wikipedia research mindmaps. In accumulating some of those sources, I have already been confronted with their ephemerality. (And these are public sources only; I know lots of e-mails I would've liked to have access to by the likes of Wales, Sanger, and Stallman that apparently no longer exist.) So, doing a quick check-link analysis of the largest mindmap I find the following: 941 of those resources are "OK"; 21 are "404" (no longer there); and 10 "Timeout". So, just within a few years \~2% aren't readily available. For example, the link to Sanger's 2005 information about his (then) new Digital Universe project is already broken; but I must say news sites are the worst. Then, there are the URLs that don't have what they use to, those that are now password protected, and those that have new URLs because of a site reorganization -- blogs seem to be the worst on this front. Of course, I don't know if this rate is a linear trend and I would be interested in any research that shows longitudinal decrepitude rates of an existing corpus of links.
In any case, I expect my own modest historical inquiries are only the beginning; I think people will be writing histories of Wikipedia and the larger free culture movement decades in the future, though I am not sure how much of what we have today was still be there for them. I was surprised, and happy, to find that someone else is already making use of my Nupedia-l archive, so I thought it would do something similar for my other sources. I don't think this would be of much use to anyone today, and is somewhat "tainted" in that it is my own analytical take and selection of sources -- absent summaries, annotations and excerpts -- but it might be of use in the future.
This archive includes the HTML versions of two mindmaps and a copy of the online resource to which they link to. If you do make use of it, you can continue to refer to it as part of the "Reagle Wikipedia Archive."
wp-sources.tar.bz2 was made by placing the HTML version of the mindmaps (
field-notes-cat.html) on a Web server and then issuing: ``
wget --restrict-file-names=windows -c --recursive --level=1 --span-hosts --convert-links --execute robots=off -t 4 http://reagle.org/joseph/2008/02/wp-srcs/field-note-cats.html
Malte on 2008-02-07
Thanks for the interesting post! This may be slightly off-topic, but what software are you using for your research mindmaps?
Joseph Reagle on 2008-02-07