To confirm the power law in Wikipedia edits (many doing a little, a few doing much) this regular expression and Python code parses a Wikipedia history fairly well:
history_regex = r""".*?oldid=(\d+).*(\d\d:\d\d.*?\d\d\d\d)</a>.*<span class='history-user'>.*?>(.*?)</a>.*(?:<span class='comment'>(.*?)</span>)?</li>""" regex_obj = re.compile(history_regex) url = sys.argv html = getHTML(url) lines = html.split('\n') for line in lines: if line.startswith("<li>(<a"): counter = counter+1 match_obj = regex_obj.search(line) if match_obj: oldid,date,author,comment = match_obj.groups() edits.setdefault(author,).append((oldid,date,author,comment)) counts = [(author,len(edits[author])) for author in edits.keys()] counts_s = sorted(counts, reverse=True, key=operator.itemgetter(1)) print counter for author,number in counts_s: print author, ";", number
Joseph Reagle on 2005-12-16
“For example, a brief analysis shows that on the Harry Potter page, of the 295 editors who made the last 500 edits, the top 10% of editors made 29% of the edits. A similar pattern is found on other Harry Potter Project pages.”
Jakob on 2005-12-16
And what are your results? In my masters thesis I found out that Lotka’s law applies very well for authors with small numbers of edits but there is a deviation for the most active editiors. More precise results will be published soon.