wget --exclude-domains gizmology.net -e robots=off -nH --cut-dirs=3 --base=http://web.archive.org/web/20030822044803/ http://www.nupedia.com/pipermail/nupedia-l/ -r -l 4 -N -k -p -R js -Gbase http://web.archive.org/web/20030822044803/ http://www.nupedia.com/pipermail/nupedia-l/
So, doing a quick check-link analysis of the largest mindmap I find the following: 941 of those resources are “OK”; 21 are “404” (no longer there); and 10 “Timeout”. So, just within a few years ~2% aren’t readily available. For example, the link to Sanger’s 2005 information about his (then) new Digital Universe project is already broken; but I must say news sites are the worst.
wget --restrict-file-names=windows -c --recursive --level=1 --span-hosts --convert-links --execute robots=off -t 4 http://reagle.org/joseph/2008/02/wp-srcs/field-note-cats.html
perma.cc helps authors and journals create permanent archived citations in their published work