This month’s Scientific America has an article by Charles H. Bennett, Ming Li and Bin Ma examining the evolution of 33 chain letters using algorithms borrowed from genetic analysis; these algorithms permit one to postulate the relatedness of different animals (evolutionary phylogeny) by looking at how DNA - and its alterations- persist in a historical population. In this case they posited a family tree of chain letters and noted the points of divergence, the subsequent subtrees, and the relative age of the changes.
I’ve often thought that it would be interesting to apply these techniques to the domain of culture/memes. In particular, I’ve thought of following trackbacks and analysing the characteristics of the discussion. This paper shows the idea has some merit, and hints that the following questions might be asked:
* Transitiveness: when folks include short blog entries on something of note, how often do they refer to the original source, versus the encountered source they were first exposed to?
* Mutation: does the text by which people cite a story substantively differ, particularly amongst ideological communities? The paper briefly mentions an approach of doing textual analysis by compressing text versions and determining the relative degree of redundancy: the less redundant, they more relatedness one can posit. For example, could I identify ideological clusters of blogs given the compression ratios of the text associated with their citation of a common story?
* Age: How long do stories exist in the Web media before they “pop”? For instance, news stories might exist for some period before the are “slash-dotted” or trickle up to the top of popdex . (One of the blog citation cites used to provide the acceleration of a story, though I can’t find it now.)