Open Codex

2007 Dec 17 | The elites and bourgeoisie

I recently had the opportunity to catch up on some of my reading including new quantitative analysis of Wikipedia contribution. In particular, the question about the inequality of user contribution is a long-standing one (Wales 2005wew2, Voss 2005mw, Swartz 2006www, Ball2007, Kittur et al. 2007, Viegas et al. 2007, and Priedhorsky et al. 2007.) Jimmy Wales originally noted in December of 2005 that "half the edits by logged in users belong to just 2.5% of logged in users." Research since 2005, particularly Kittur et al., measuring contribution differently, showed that elite contributions were less powerful relative to the long tail of small contributors, or even that the trend has changed over time. (As those authors put it: Power of the Few Vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoise.) However, in Quantitative Analysis of the Wikipedia Community of Users Felipe Ortega and Jesus Gonzalez-Barahona (2007) conclude that their analysis shows that "approximately 90% of the active editors is responsible altogether for less than 10% of the total number of contributions (Gini coefficient of 0.9360)" (p. 82). So the long tail isn't doing as much as we might think. The authors explain this difference by way of a methodological concern: counting user contribution via total contributions of the life of the user misses those users who are new and active, but have not accumulated a significant total count yet. After segmenting users based on their contributions in specific periods Ortega and Gonzalez-Barahona find that those users with a high number of edits in early months typically continue to make a high number of edits (i.e., stable), and a discrepancy between high contributing and low contributing editors is significant (i.e., unequal).

I met Felipe Ortega at this year's WikiSym and recently asked him about the present state of research today:

The current state of research about the inequality of contributions to the English Wikipedia (also extended to the top ten language editions of Wikipedia) shows that the distribution of contributions to articles (including stubs and redirects, filtering bots) is strongly skewed towards a small core of very active contributors. This is the same well-known effect already identified in libre software development projects. The graphs depicting the contributions from distinct generations of very active users, along with the graphs showing the Gini coefficients of contributions per month, rebate the argument of the "rise of the bourgeoisie" stated by Kittur et al. The inequality level of contributions to the English Wikipedia has remained stable during the past 4 years. Similar inequality levels per month have been found for the other top ten language editions, thus showing a common pattern shared among the biggest Wikipedias. Moreover, we have found that the inequality level in these top-ten language editions is stabilized around a 80%-85% interval for the Gini coefficient, showing a spontaneous autorregulation process that deserves further research.

this entry posted to social/wikipedia;
comments (1)

2007 Dec 13 | The Iron Law of Oligarchy

In my dissertation I make only a passing reference to the "Iron Law of Oligarchy," an expression coined by Robert Michels, an early 20th-century sociologist and student of Max Weber, in his book Political Parties. Much like Weber, a lot of the cases and references are difficult to follow because of its age and issues of translation, but there are still some gems that are relevant to today. In particular, I find his "final considerations" to be worthy of sharing on the Wikipedia question, touching on "adminitus," incumbancy, and evolution:

Leadership is a necessary phenomenon in every form of social life. Consequently it is not the task of science to inquire whether this phenomenon is good or evil, or predominantly one or the other. But there is great scientific value in a demonstration that every system of leadership is incompatible with the most essential postulates of democracy. We are now aware that the law of the historic necessity of oligarchy is primarily based upon a series of facts of experience.... The process which has begun in consequence of the differentiation of functions and the party is completed by a complex of qualities which the leaders acquire through their detachment from the mass. At the outset, leaders arise spontaneously; their functions are accessory and gratuitous. Soon, however, they become professional leaders, and in the second stage of development they are stable and irremovable.... (Michels 2001:240)

It follows that the explanation of the oligarchical phenomenon which thus results as partly psychological; oligarchy derives, that is to say, from the psychical transformations which the leading personalities in the parties undergo in the course of their lives.... The oligarchical structure of the building suffocates the basic democratic principle. That which is oppresses that which ought to be. (Michels 2001:240-241)

The democratic currents of history resemble successive waves. They break ever on the same shoal. They are ever renewed. This enduring spectacle is simultaneously encouraging and depressing. When democracies have gained a certain stage of development, they undergo a gradual transformation, adopting the aristocratic spirit, and in many cases also the aristocratic forms, against which at the outset they struggled so fiercely. Now new accusers arise to denounce the traitors; after an era of glorious combats and of inglorious power, they end up by fusing the old dominant classes; whereupon once more they are in their turn attacked by fresh opponents who appealed to the name of democracy. It is probable that this cruel game will continue without end. (Michels 2001:245)

this entry posted to social/wikipedia;
comments (0)

2007 Dec 12 | Conflict management and class exercises

In the spring I will again be teaching a class on conflict management. More than one colleague has expressed puzzlement as to why I would teach this class, but I really enjoy it. While I, and a few students, might enjoy discussions on the historical nuances of technology or reference works, conflict management is relevant to everyone -- and I do get to discuss Wikipedia NPOV and good faith! I have developed two exercises for understanding cognitive priming on cooperation/competition (i.e., prisoners dilemma) and integrative bargaining that might be of use to others.

this entry posted to career/teaching;
comments (0)

2007 Dec 05 | Blogging anxiety and post-naive abandonment

As we all know by now, there are manifest anxieties associated with the practice of blogging. The most common one being the stress of feeling as if one hasn't "updated" the blog recently. Biella Coleman offers the non-intuitive theory that not updating is a virtue. (She kindly refers to the moribund state of this blog as an example.) Another perennial issue is those who throw off the burden of blogging and declare that while it caught their interest for a time, they are done with it, such as Peter Krapp's recent "Top Ten Reasons I Don't Blog Anymore". Inspired by a common theological turn, I think of this as a "post-naive" blog declaration. Marcus Borg, a liberal theologian, argues that many people will go through three phases of religious belief: naivete (a superstitious child), critical (a skeptical adult), and post-critical naivete (an open heart). (Neil Gillman has noted a similar theory of transition in Abraham Heschel's "situational thinking," Gabriel Marcel's "secondary reflection," and Paul Ricoeur's "second" or "willed naivete".) Therefore, I often expect that after the initial flush of excitement with blogging, subsequent anxiety and abandonment, there will come a time when the "post-critical" blogger will post again without worry about site statistics, updates, and ego. On another "blog," (though it had daily content, photos, audio programs and such before blogs, flickr, and podcasts), I've asked the robots to please pass on by ("User-Agent: * Disallow: ") and presently post to it about once a month. I'm quite happy with that.

this entry posted to social;
comments (0)

2007 Dec 04 | Peer production: Wikipedia and the OED

I often compare the "peer production" of Wikipedia to that of the OED, which was also built with the contributions of hundreds of people from all walks of life. I just finished Simon Winchester's The Meaning of Everything: the Story of the Oxford English Dictionary -- an excellent complement to his earlier The Professor and the Madman -- and note that the following could perhaps be said of Wikipedians as well:

... but we do not really know why so many people gave so much of their time for so little apparent reward. And this is the abiding and most marvelous mystery of the enormously democratic process that was the Dictionary -- that hundreds upon hundreds of people, for motives known and unknown, for reasons both stated and left unsaid, helped to chronicle the immense complexities of the language that was their own, and that they dedicated in many cases -- such as the Thompson sisters did -- years upon years of labour to a project of which they all, believed by some set of unfathomable and optimistic notions, insisted on becoming a part. (Winchester 2003:215)

this entry posted to social/wikipedia;
comments (0)

2007 Nov 29 | Durova: openness and enclaves

Previously, I've explained what openness means relative to a community rather than the license governing its content. In short an open content community, as I specify it, has five normative features: open products, transparency, integrity, nondiscrimination, and noninterference. It also has an important descriptive feature: some level of structure/closure (which I believe is unavoidable, see "The Tyranny of Structurelessness") and consequent discussion about how this can be reconciled with the larger egalitarian ethos. In the dissertation I pose Wikipedia in light of this criteria and explore three challenging cases: can anyone really edit?, office actions and oversight, and the female only WikiChix list.

The latest Wikipedia controversy (Mestel 2007) would also be a good case given references to a secret cabal-like email list, as summarized in The Signpost:

A case involving the actions of Durova and Jehochman, and in particular a controversial block by Durova of !! as a sockpuppet, based on evidence she refused to reveal on-wiki, saying that she was concerned that to do so could give puppetmasters too much information on her investigative techniques. This evidence was provided to some administrators, and was later leaked, with some users, including the Arbitration Committee, arguing that the evidence was insufficient for blocking !!. Durova has resigned her adminship, and, in what appears to be an exceptionally quick resolution of the case, remedies have been proposed, admonishing Durova to exercise greater care when issuing blocks, admonishing all participants to act with proper decorum, and noting that Durova must go through normal channels to regain adminship.

The very fact that there may be ancillary structure and closure (what Sunstein calls "enclaves") is alarming to some, particularly when Wales (2007moh) wrote: "*I* am involved in multiple ongoing private discussions with dozens of people. The list in question is being badly misrepresented as some kind of problem. It is a good list, and the purpose of the list is good, and not everyone on the list is perfect (as is always true)." However, with respect to the criteria, I believe it is appropriate -- even unavoidable -- for enclaves to form. If they aren't handled well -- as in this case -- they can come off as rather unseemly. And they can never be relied upon as a source of authority and discussion in the larger community: Arguments and their evidence must be ported into the larger open context or the criteria of transparency and integrity are at risk. Durova's mistake was not only in incorrectly banning someone, but in referring to that list and her secret methods of "sleuthing" sock-puppets as an authority within the larger open discourse. (Silence also had an interesting role to play, as it often does: it appears that while some people on that list did not object, this does not mean they assented.)

However, Durova has apologized for the mistaken administrative action and I presume she now appreciates the illegitimacy (and sensitivity) associated with recourse to closed authority. What I also find interesting in this discourse is the recognition of the dangers of administration -- I saw it referred to as something like "adminitus" but can't find it now. Furthermore, I am intrigued by the common Wikipedia sentiment that administration is comprising, but that encyclopedic work has restorative powers for one's own wiki-soul and relation to the community.

this entry posted to social/wikipedia;
comments (0)

2007 Nov 07 | The cognitive construal of a free encyclopedia

In a parenthetical in my dissertation I note that when I speak of the men in the history of reference works I am being unfortunately literal. Aside from the Suzanne Briet, a French documentalist at the beginning of the 20th century, few women have fallen within the scope of my readings in the secondary literature. (And sadly, Briet doesn't yet have a Wikipedia article about her.) Consequently I've kept my eyes open for such materials when perusing bibliographies. (And as a researcher Wikipedia I've been especially engaged by Milos's Women and Wikimedian projects interviews.) I recently started Gillian Thomas' (1992) A Position to Command Respect: Women and the Eleventh Britannica in which she writes of her father's reverence for his EB11:

As far as he was concerned, whatever he read aloud to us from those finely-printed onion-skinned pages was incontrovertible truth. My mother was more skeptical, and would point out that she would put more trust in an "up-to-date" reference book My father bought his copy of the Britannica as a young man intent on self-improvement, like thousands of others, through its widely-advertised installment system. (Thomas 1992:v)

This then made me think about the authority of Wikipedia. Thomas writes: "The outlay of money had been a considerable sacrifice at the time, which was perhaps another reason why he retained a reverent attitude towards its authority" (p. vi). Paying for something, or undertaking a difficult task, is known to positively influence the construal of that something (Tavris and Aronson 2007:17); therefore perhaps the freeness of free content shades how it is perceived. I can imagine interesting cognitive experiments along these lines. For example, reward subjects for accurately answering a factual question while providing them with access to a free or nonfree version of the same reference material -- the cost would essentially be against the eventual reward. I expect those who paid for access would have greater confidence in their answer and greater sense of authority in the resource than those who did not. And this is but a single experiment among a whole number of interesting construal tasks where the material essentially stays the same but secondary characteristics are varied: how does the perception of confidence in one's own answer or authoritativeness of the reference change relative to cost, media presentation (e.g., print or online), timeliness, the number of contributors etc.

this entry posted to social/wikipedia;
comments (0)

2007 Oct 18 | WikiSym Presentation

My presentation for WikiSym 2007 is now up, including a link to the abstract and paper. I fear it's a little wordy by I find that the academic habit of citation is hard to suppress even in slides, and my love of ethnographic excerpts is similarly irrepressible.

this entry posted to social/wikipedia;
comments (0)

2007 Sep 27 | All is reviewed, but in what order?

The recent discussions about the deployment of features for supporting vetted/approved pages and trusted authors, starting with German Wikipedia and perhaps expanding to the English one, is interesting in light of one of the few analyses of whether it is best to review a contribution upfront, or after the fact. (This is like the arguments in programming about whether it is better to write code in which you "look before you leap" or just leap with the knowledge that "it's easier to ask forgiveness than permission"). If you haven't read it, and you are short of time, just have a look at the discussion and conclusion which includes:

These results suggest that Wiki-Like systems create value faster than Pre-Review ones. Most people contributed under the Pre-Review system overall, but since people prefer editing to checking, a backlog of wasted work builds up. The model also suggests that Pre-Review group will do about the same as the Wiki-Like group in the long-term However, contributions taper off faster than predicted Pre-Review systems may increase people's willingness to contribute... or deter people from damaging the system... compared to Wiki-Like. (Cosley et al. 2006:9)

Also, one provocative observation is one could start with Wiki-Like then switch to Pre-Review as contributions taper off; "However, imagine the outcry from its members if Wikipedians were to switch to a Pre-Review system." (Cosley et al. 2006:9)

this entry posted to social/wikipedia;
comments (0)

2007 Sep 24 | The travail of grading

I find "giving" grades as a teacher to be as troublesome as getting them when I was a student. Alfie Kohn (1999) in his book Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes argues, based on solid research, that rewards, such as grades, often undermine intrinsic motivation (p. 148), which is key to a substantive long-term learning. This counterproductive practice persists because our educational system attempts to do two things that are often at odds with one another: facilitating learning and sorting students (p. 202). I've seen this in my own classroom. Some of the brightest students, and no doubt the most consistent "performers," have expressed a strong distaste for open ended assignments. Asking them to propose a topic that interests them is far too frightening relative to the more remedial types of tasks they have clearly mastered. As Kohn notes, "when we are working for reward, we do exactly what is necessary and no more" (p. 63); this isn't necessarily because of laziness, it also avoids the risk of hurting one's GPA. On the flipside I've seen students with a lot of potential but also significant challenges (perhaps English isn't their first language, their previous education wasn't as rigorous, illness, financial constraints, etc.) become demoralized with a poor grade. Few things are as frustrating as seeing motivated students and a positive classroom culture taking hits because of grades. Nor do I want to be in that position of judging students' circumstances: perhaps Solomon could fairly judge between genuine illness, family emergency, forced overtime, or a hangover -- but I can't.

I'm not completely comfortable with my present approach, perhaps one day I will become an "easy" grader and submit all "A"s except for the most obviously negligent, but this is what I work with now: explicit criteria and early feedback.

According to the standards of my department, which are quite useful and comprehensive, an "A" is a reflection of an "An Outstanding Student" whose "Writing demonstrates impressive understanding of readings, discussions, themes and ideas. Written work is fluid, clear, analytical, well-organized and grammatically polished. Reasoning and logic are well-grounded and examples precise." My present understanding of an "A" is also informed by my experience as a Ph.D. student. I expect I've been a bit of a "grade grubber" myself, though fortunately willing to take risks to pursue my interests. One of my greatest disappointments in my nearly 10 years of classes, but not my lowest grade, was an "A-" in a historical methods course. I loved the course, adored the professor, and invested a lot of myself in the research and final paper. But at the outset the professor said he only gave an "A" to those papers he could see being accepted for publication and he was true to his word. After a few days I could admit to myself that my paper was not yet at that level, my research and thinking weren't developed enough yet, and I learned I was not alone -- in fact I was in the vast majority. (It's a sad truth of how we can feel better or worse about ourselves through comparison with others! A colleague of mine once cynically captured this with a sentiment that, "every time a friend of mine succeeds I die a little inside.")

In any case, I use a similar threshold in undergraduate classes. I don't "give" grades, I evaluate performance according to the departmental criteria. I don't grade on a curve, but I do make sure my expectations are reasonable by first reviewing the range of performance. An "A" is truly outstanding, something I could use as an exemplar in future courses or even recommend to someone interested in the topic. An "A-" fell little short and could be a "A" with a few small tweaks. A "B" is a reflection of good work, a "C" of "fair" work. I do want to be humane, some professors have cut me slack in the past, but also fair. It is not at all uncommon that at the end of the semester when I'm porting grades from my spreadsheet to the bubbles of the Scantron to want to bump a grade, but I fear this may be favoritism, so I don't.

Grading sucks, but it's a requirement of the job, and I am not sure of what the alternatives would be.

this entry posted to career/teaching;
comments (5)

2007 Aug 28 | Keen's misunderstanding?

I recently read Andrew Keen's The Cult of the Amateur: How Today's Internet Is Killing Our Culture. Keen provides an easy to read litany of problems associated with "Web 2.0" user generated content. While I do have some thoughts in response to his larger thesis, one thing I found confusing is his understanding of a particular decision made on Wikipedia. Like many critics, Keen is concerned by the fact that experts aren't appreciated or respected above amateur contributors. With respect to a well-known "climate change" conflict, he writes how Dr. William Connolley, an expert on climate modeling, had to go "head-to-head" with an aggressive Wikipedia editor, and was eventually put on "editorial parole" by the Arbitration Committee:

Connolly, who was pushing no POV [point of view] other than that of factual accuracy was put on editorial parole by Wikipedia, he was limited to making one entry a day. When he challenged the case, Wikipedia arbitration committee gave no weight to his expertise, treating Connolly, an international expert on global warming, with the same deference and level of credibility as his anonymous vote -- who, for all anyone knew, could have been a penguin in the pay of ExxonMobil.

I believe Keen is referring to this case, which I consider to be a lovely example of judicious decision making on Wikipedia. While I'm very sympathetic to the frustration and amount of work entailed in repulsing crackpots and moonbats from articles such as evolution and global climate change, the decision was not as Keen describes it. Consider the following excerpts from the arbitration case:

  • Principles
    • Neutral point of view
      • 2) Wikipedia's neutral point-of-view (NPOV) policy contemplates inclusion of all significant points of view regarding any subject on which there is division of opinion. However, this does not imply that all competing points of view deserve equal consideration in an article.

        • Passed 8-0
    • Consensus
      • 3) As put forward in Wikipedia:Dispute resolution, Wikipedia works by building consensus. This is done through the use of polite discussion, in an attempt to develop a consensus regarding proper application of Wikipedia:Policies and guidelines such as Wikipedia:Neutral point of view. Surveys and the Request for comment process are designed to assist consensus-building when normal talk page communication has not worked.

        • Passed 8-0
    • Relative value of references
      • 8) Since the goal of Wikipedia is to provide accurate content, we cannot regard all references as equally valid and give them all equal weight. Editors should exercise care in the selection and use of references. The closer a reference is to current peer reviewed work, the better. Balance must also be attained by properly labeling and attributing significant dissenting views (where they exist).

        • Passed 7-1 with 1 abstain
  • Findings of Fact
    • William M. Connolley as expert
      • 3) William M. Connolley is widely viewed in Wikipedia as being highly knowledgeable in the field he is writing about.

        • Passed 7-1
    • Cortonin's view of real greenhouses
      • 5) Cortonin has persistently and aggressively advanced views which confuse metaphorical explanations of the greenhouse effect and greenhouses with the technical scientific phenomena underlying them. Despite determined efforts by other editors to inform him and point him to information on the subject he seems to have difficulty understanding both the use of metaphor and the scientific literature in the field, see Talk:Greenhouse effect. This is a persistent condition which seems likely to continue.

        • Passed 8-0
  • Remedies
    • Cortonin: Six-month ban from editing certain [climate change] articles

    • William M. Connolley: Six-month revert parole on certain articles
    • JonGwynne:Three month ban from Wikipedia ... Six-month ban from editing certain articles

That is, despite Keen's summary, Connolley was paroled for reverting others too frequently without explanation. His expertise was noted, as was the importance of providing authoritative references -- which is the ultimate authority in claims one can make on Wikipedia. In response, he was put on a six-month parole limiting his ability to make reversions; there was no limit to "making one entry a day" from this case. Yet of his "opponents," Cortonin received a six-month ban from editing global climate articles, and JonGwynne was banned from Wikipedia for three months and banned from editing climate related articles for six months.

Because Keen opts not to reference online sources, it is possible he is referring to something else. But if he is referring to this case, which I think he is, his summary is very misleading.

this entry posted to social/wikipedia;
comments (0)

2007 Aug 24 | Ward Cunningham's Keynote at Wikimania 2005

At this year's Wikimania conference I was talking to Cormac about my dissertation and how at the beginning of the 20th century some like Wells and Otlet were inspired about the potentials of technology (i.e., microfilm and index cards) and their possible contributions towards a universal encyclopedia. Cormac noted that Cunningham had actually mentioned index cards in his Wikimania 2005 presentation, which I was not able to attend. Fortunately, I was able to find a copy of the video and my rough notes on it are available (Cunningham 2005).

this entry posted to social/wikipedia;
comments (0)

2007 Aug 15 | Lessig and emergent corruption

It is with interest that I've noted Lawrence Lessig's announcement a career shift away from his work on restoring sanity to the copyright regime. I too once grew frustrated and fatigued with the policy process around copyrights, patents, and privacy because of that which Lessig now calls "corruption." (Fortunately, Lessig had the fortitude to launch Creative Commons and help give voice to the free culture movement despite his much invoked pessimism.) By "corruption" he doesn't mean bald bribery but systemic and often hidden "queering" of the political process away from genuine discussion and reason, and towards monied interests. (See my little "done with privacy" essay I wrote when I had gotten fed up in the privacy space.)

I agree -- and hope -- that if there was a way to tackle this corruption issue, other concerns might be more amenable to improvement. And I think Lessig is the man for the job. In addition to his passion, commitment, and brilliance -- and despite his claim that he's changing his focus -- I think his expertise and history with free-speech and constitutional law is highly relevant to this new endeavor. I say this because I've always felt that this issue of "corruption," or perhaps "monied influence" is a very American sort of paradox: the democratic ideal of free speech, we each say our piece, entangled with a market and sociological reality, we are influenced most by those with the most resources.

For example, I have been a grudging advocate of campaign finance reform. I am an advocate because of the problems I see in our present system, but hesitant because I can't easily square any corrective intervention with my free-speech values. At times I think that the solution is to attack the corporate angle: companies should not be considered persons, nor should they have civic rights (i.e., to support a candidate) as a person does. But I'm not sure this would solve the problem -- some wealthy might simply (continue) to act as agents for their corporate symbionts as it is in their own personal short-term interest.

So, I think -- and hope -- Lessig can help. Yet I'm surprised to see in his musings that he's worrying about questions of intention: is Hillary Clinton consciously taking money for votes, or perhaps being subconsciously influenced? (I would suspect there is some social and political psychology research on opinion formation and social influence that might be relevant.) But sadly, while there is plenty of conscious and unconscious influence peddling going on, the problem is actually much harder.

Consider a scenario in which every politician is scrupulously honest and true to her beliefs -- a best case relative to our present. Also, support, particularly money, can bestow much power upon any politician to influence the public -- a case much like our own today. In such a scenario resource rich minorities can exert disproportionate influence counter to the interests of the majority of the present, majority of the future (e.g., the long-term), and reason itself. While this might be "corrupt" it requires no conscious or even subconscious persuasion from money. Rather, it is an emergent result of the social, psychological, and system dynamics of influence: supporters simply select the politician; the politician through media purchases influencing the public. And it just so happens that this is sometimes counter to our own public interests, and not easily fixed.

this entry posted to social;
comments (0)

2007 Aug 13 | Source-as-primary-character

I recently finished Peter Heather's (2006) The Fall of the Roman Empire. This popular, though no less rigorous, history is widely praised. The narrative is engaging and I appreciate the glossary, dramatis personae, and timeline; these help given the scope of the book spans 150 years, dozens of emperors (East and West), generals, and barbarian Kings. What most impressed me was Heather's treatment of sources. Many histories, particularly of ancient societies, are written in the third person objective. Yet, as I learned in my historical methods course, the practice of history is more than a recounting of events, but a substantiated argument about people and events in time. Heather presents his arguments as such: identifying when he agrees or disagrees with others or scholarly consensus, and addressing the circumstances of his sources. Rather than being simply a footnote, sources come to the foreground and become part of the story. A history of the source, such as Pullodius' commentary on Ambrose written in the margins of De Fide, or the listing of fourth century military and civilian offices, the Notitia Dignitatum, are interesting in themselves and contribute to a much deeper understanding of the ground on which Heather's arguments rest. While a popular history might present a more accessible or exciting version of an old tale, it is rare for it to communicate the challenges and excitement within the discipline -- because popular history often obscures its scholarship. But Heather brings it forward and what I thought might be a rather staid field -- don't we already know all we can do about the ancients? -- is shown as alive with new archaeological finds, textual fragments, analysis, and argument.

I know this will influence the next revision of one of my historical chapters with respect to how I speak about some of the primary sources I found.

this entry posted to method;
comments (0)

2007 Jul 27 | The Wikipedia Plays

One of my favorite categories in my "field notes" mindmap is that of "zeitgeist." Very little of this material will actually find its way into the dissertation -- indeed, in that mindmap few of the ~800 sources I have across ~50 categories will -- but I collect it nonetheless. I use "zeitgeist" for -- often amusing -- examples of Wikipedia leaking out into the broader culture. A friend of mine just forwarded this announcement of "a mini-marathon of short plays that surf the wikipedia wave through seventeen related entries" (Gans 2007). That tickles me for some reason.

In a few days I also leave for Taiwan to attend Wikimania 2007. This should be a lot of fun, and I look forward to reconnecting with and meeting anew fellow enthusiasts.

this entry posted to social/wikipedia;
comments (0)

2007 Jun 29 | Punditry and The Web 2.0 debate

I've been following the discussion at the Web 2.0 Forum with interest. In summary, Michael Gorman of Encyclopaedia Britannica complains that Web 2.0, and it supposed champion Wikipedia, is a "digital tsunami" (Gorman 2007jer) threatening education, scholarship, and the underlying values of Western civilization (Gorman 2007ssi1). Yet, while I follow the discussion with interest, I actually don't find it substantively engaging. Many of the arguments, particularly Gorman's, tend to be characterized by unsubstantiated claims and the purposeful construal of nuanced issues as extremes -- propping up strawmen for subsequent potshots. As I've already indicated, while it might bring pundits a sense of righteousness and attention, in the end "Time, not arguments, will utlimately tell." (And, for this reason I appreciate Larry Sanger's continuing efforts to implement his vision.)

Why, then, do I find this discussion of interest? Punditry, communicative disorders, and history. First, I'm trying to come to an understanding of "punditry," and I think Gorman's recent bloggings is an exemplar. My sense is that sometimes people argue for arguments' sake. That is, even if they genuinely believe the thing they are arguing for, attention, not persuasion, is the goal. (In a sense, perhaps it is a high-brow, and perhaps more genuinely held, form of trolling -- another interesting phenomenon.) Second, I'm interested in communicative disorders. For example, Gorman faults Wales for allegedly saying "If you can't google it, it doesn't exist." (This quotation was originally unsourced, challenged by Wales, sourced by Gorman, and the disagreement continued.) But earlier in the same essay, Gorman himself complains "More solid and reputable websites are buried by the current algorithms of the Internet because they are often fee-based and cannot garner as many links as free sites (links are key to boosting one's search engine rank)" (Gorman 2007ssi1). If Wales made such a claim, I would expect it would likely have been a descriptive statement, rather than normative. That is, this is largely the way it is, versus the way it should be. And, this is essentially the same thing Gorman notes, and laments, above: if one's content is not freely accessible it is "buried." And I can't imagine anyone claiming that all nondigital information should be dismissed out of hand, and perish from the earth. The real issue is the normative response one should make in light of the "Google description": make information freely accessible, or enable Google to index proprietary sources, and even nondigital media (e.g., old books). This, to me, is an interesting question, something which is happening today, and something I would like to learn more about. For example, what kind of arrangements does Google make with fee-based sites to index or content? (JStor articles often are a prominent results in my Google queries.) What is the user response to a search result which is not immediately available to them? By purposeful misunderstanding punditry confuses genuine grounds for agreement and disagreement, and possible understanding.

My final reason for my interest is because of the ways in which this debate parallels similar discussions throughout history. That's right, as I argue elsewhere on this site, and in my forthcoming dissertation, reference works frequently act as a flashpoint and focus for larger social anxieties about change. I suspect my argument here is a consequence of a historical sensibility: what most people see as an extraordinary shift appears, with perspective, to be one thread in a larger tapestry. For example, Gorman's concern with plagiarism is ahistorical. Again, elsewhere I argue that the history of reference works is a history of plagiarism. When the Britannica was in its infancy, much like the Wikipedia today, its founding editor admitted he "made a Dictionary of Arts and Sciences with a pair of scissors, clipping out from various books a quantum sufficit of matter for the printer." (Yeo 2001:180) I'm not condoning plagiarism, but I find if we want to make normative statements about how things should be, it is best to genuinely understand the way things are, and have been in the past. This seems to be a difficult task in the midst of punditry, without some level of scholarly remove or Neutral Point of View -- as Wikipedians say.

this entry posted to social/wikipedia;
comments (0)

2007 Jun 20 | Creating a Semester's Class Schedule

I just discovered Python's awesome dateutil package which implements much of the iCalendar standard, including recurrences! Consequently, it's trivial to generate a calendar for the days classes meet. I assume with a little work one could even handle the holidays. In any case, here's an example:

#!/usr/bin/python2.5

from dateutil.rrule import *
from dateutil.parser import *

sem_start = '20070903T140000'
sem_end = '20071212T140000'
days = MO,WE

meetings = list(rrule(WEEKLY, wkst=SU, byweekday=(days),
    dtstart=parse(sem_start), until=parse(sem_end)))
for meeting in meetings:
    print meeting.strftime("%b %d %a")

this entry posted to technology/python;
comments (0)

2007 Jun 19 | Steinhardt LaTeX template

Given a wretched experience with using Microsoft Word for my Masters thesis 10 years ago, I decided to try LaTeX this time around. One might think that Microsoft Word has improved, but my tinkering has shown it is still quite dangerous. Word's notions of styles are extremely frustrating, and have changed over time. Additionally, creating multi-document files, or very large files risks corruption. Furthermore, given I work in an interdisciplinary space, it is useful to be able to format a document, including footnotes and bibliography, as, say, either historical or sociological: LaTeX is quite good at this.

That said, LaTeX is a pain. Granted, I prefer a simple structured text markup language over a corruptible proprietary binary blob, but LaTeX is like the Perl of markup languages, and I am a Python guy. (To be fair, TeX and LaTeX are now decades old.) No doubt, regardless of what you want to do, there is a way to do it in LaTeX. The problem is, like Perl, there are too many ways to do it. There are dozens of packages that appeared to do the same thing, though many are different enough to make you wonder why the difference is important. It is difficult to discern the present best practices and most of the documentation is in annoying PDF. Even understanding LaTeX syntax is a confounding task. Does '[]' mean an optional parameter to a command? Mostly yes, but sometimes no. The only way I could get a handle on the world of LaTeX was to purchase Tex for the Impatient and The LaTeX Companion.

In any case, when I do have a problem the LaTeX community on comp.text.tex is extremely helpful. So even though there is a steep learning curve, when I ascend a particular hill, that challenge stays behind me. There is no equivalent to Microsoft Word kicking me down the mountain.

Like all colleges, Steinhardt has a particular format they require for doctoral dissertations. Unfortunately, its specification is sometimes ambiguous, and more a creature of typewriting, than computer typesetting. (For example, section headings are supposed to be underlined!) In any case, I thought I would share the fruits of my frustrations: steinhard-pkg-opts.tex. I haven't yet received approval that this is sufficient, nor am I an expert in LaTeX, but, should someone else at Steinhardt need such a thing, this might be a start.

this entry posted to career/phd/dissertation;
comments (0)

2007 May 02 | Help with teleonomy

While grading papers I noticed that a student referred to the Wikipedia definition of teleonomy, a concept I have discussed in class, with a quotation that I found contrary to my own understanding: "teleonomic process, such as evolution..." I thought, "whoa!", evolution is not teleonomic and the version of the Wikipedia article I had referred the students to did not say as much. Turns out, the student used the current version of the article, which has been greatly expanded, a good thing, but in a way contrary to my understanding. (The lesson for the student will be, as I told them, they should only use Wikipedia articles as authoritative, when they use the specific versions I have vetted. Otherwise, they should be used as a non-authoritative (starting) source, backed up by an authoritative source.)

In any case, I ended up falling into a rabbit hole of posting a massive comment about my concerns with the article, but now need to get back to grading papers! Are there any "philosophers of science" out there, that are not grading papers :-), and would be willing to contribute to the article were discussion?

this entry posted to social/wikipedia;
comments (0)

2007 Apr 25 | The untested prejudice

Sanger's latest article, Who Says We Know: on the New Politics of Knowledge is an argument that meritocracy, including an authority accorded to credentialed experts, is preferable to "epistemic egalitarianism." He writes that Wikipedia defenders argue for epistemic egalitarianism, or "dabblerism," on the basis of pragmatics (i.e., experts don't have the time or interest, or the crowd can be wise) and fairness. However, Sanger objects to these claims as either inaccurate (i.e., experts can be included, or the crowd is often dumb) or inferior to a genuine meritocracy. He notes that Wikipedia, too, in a conflicted and twisted way, also "likes" meritocracy. But its meritocracy is not with respect to what you know, but how much time you spend on the project. Or, academically credentialed expertise, as a form of merit, is accepted, but only with respect to citation. This leads him to ask, "If Wikipedians actually believe that the credibility of articles is improved by citing things written by experts, will it not improve them even more if people like the experts cited are given a modest role in the project?" Sanger concludes that Wikipedia has an "untested" prejudice against experts.

The essay has a number of interesting points. For example, Sanger claims Wikipedia's parity with Encyclopedia Britannica in the famous Nature study was not because of many Wikipedian's background in technology and science, but because the science domain is epistemologically more "objective" and consequently an easier topic on which to collaborate. This makes me think of The Story of Webster's Third: Philip Gove's Controversial Dictionary and Its Critics, in which the Third's editor, Philip Gove, campaigned for a new standard of objectivity, much to the chagrin of the editor preparing the wine guidelines (Morton 1994:92). I am no wine expert, but I wonder how Wikipedia's article fare?

In any case, it is the claim that Wikipedia has an untested prejudice that I find most perplexing. If Wikipedia has a prejudice, it is towards "dabblerism" because of the often cited failure of the Nupedia "test" -- which perhaps unfairly taints "expertism" -- and the success of Wikipedia. Sanger will keep making his expertise argument, and Wikipedians will keep making their crowd arguments -- and despite all the words being spent, neither party precludes the participation of crowds, in Sanger's case, nor of experts, in Wikipedia's case. Ultimately, this argument will be won by the test of "running code" -- in this case a widely used reference work. Will Sanger's new project provide so much added value to sustain itself? Or, will Wikipedia develop means of rating the quality of articles or contributors? Time, not arguments, will utlimately tell.

this entry posted to social/wikipedia;
comments (0)

2007 Apr 20 | Recent special issues on commons and wikis

Recently two short essays, which started out as blog entries here, have been published in collections that might be of interest to readers. In Re-public's Wiki politics - part I, I'm hoping Yanis Varoufakis can tell us more about the actual practice of democracy in ancient Greece compared to present day Wikipedia governance. In addition to my NYU colleague Alex Galloway, I particularly recommend Felix Stalder's work from In the Shade of the Commons -Towards a Culture of Open Networks.

this entry posted to career;
comments (0)

2007 Apr 12 | NYU and Sallie Mae

Longtime readers will know I've been frequently frustrated and puzzled by NYU's infrastructural choices. When I first arrived I learned they used my SSN as my student identification. I protested, wrote a formal letter, and was eventually issued a new SSN-like number. But, of course, this didn't easily propagate through the institution so there were often mismatches, duplication, and confusion when I used the computer labs, athletic, library, health, and insurance services. Eventually, this policy blew up in their face when students' SSNs were accidentally released and all students were issued a new, proper, identification. But this meant that for a period, I had to remember three different numbers!

Other poor choices of NYU: an inefficient library site, an inaccessible and proprietary classroom management system (i.e. blackboard), a broken student registration site (e.g., "don't use the back button", and "it works in Internet Explorer"), blocking the SSH port so I can't securely use their network, requiring a proprietary client for their wireless network (for which there's no "support" for Linux and Palm), etc.

The latest head-scratcher is an announcement that we must use their new electronic billing system. Clicking on a URL in the e-mail announcement, I was redirected to what appeared to be a Sallie Mae site with an NYU logo. I immediately think this feels an awful lot like a phishing attack. (I frequently receive advertisements from spammers, who know I am a student, to my unpublished NYU e-mail address, so some how even this leaked out!) And even if this isn't a phishing attack, Sallie Mae is the last company I want to have any dealings with: their slimy marketing and brutal collection practices are widely criticized; NYU's announcement came just before Sallie Mae's sweetheart payoffs to university officials became headline news!

After I e-mailed NYU about the security and privacy practices of the service, I learned I can "petition" to opt out after I fill out more paperwork. NYU... man...!

this entry posted to career;
comments (0)

2007 Mar 29 | Brandt and Wikipedia plagiarism

Since I am a student of plagiarism in reference work production, I read with interest Daniel Brandt's (2006) analysis of Plagiarism by Wikipedia editors. He claims that he found 145 instances of Wikipedia plagiarizing others (roughly 1% of his sample), and projects that the "plagiarism rate on Wikipedia is at least 2%."

I have a few of thoughts, the first of which are statistical nuances. For example, the size of the sample that was used to calculate the 1% rate is not clear. His description of the winnowing down of his original corpus to the plagiarized cases is somewhat confusing, but this is how I understood it:

Because his 1% figure is rounded, it's not clear if his divisor is the 16,750 articles he started with (yielding 0.087) or the 12,750 articles in which he found at least one clean sentence (yielding 0.012); I think it should be the latter: 1.2%.

Also, it's difficult to follow the details of the winnowing of the 5,867 articles for which Google found duplication, to the 1,682 articles that weren't "rogue," to the 145 "confirmed" plagiarism, but clearly the bulk of duplications were those containing material from Wikipedia. This invites the question of how much is Wikipedia itself plagiarized?

And, remember, these are descriptive statistics of his sample only. Before making any inferential claim to the whole of the Wikipedia one has to ask about the sampling methodology: why biographical articles, and are they more or less likely to have plagiarism than others? (In my experience I find biographical plagiarism to be common.) Also, we have no parameters for the confidence of the inference. For example, to be very confident (i.e., 99%) that the inferred estimate is only off by 1%, one would need a proper random sample of 16,453 ("clean") articles.

Finally, and more substantively, Brandt conflates plagiarism with verbatim copying. As Posner (2007:12) writes: "not all plagiarism is copyright infringement and not all copyright infringement is plagiarism." I'm not sure if he excludes all public domain copying (which still might be considered plagiarism), or only unsourced public domain material. Also, elsewhere, I wrote how I found a verbatim copy of text in a biographical article, raising copyright infringement concerns. After placing my report on the copyright infringement page (there can be dozens of reports today), the infringing text was rewritten perhaps removing it from the scope of copyright -- and Brandt's method -- but perhaps not from the cloud of plagiarism.

Consequently, my understanding of Brandt is that his conclusion should be read as: 1.2% of a sample of Wikipedia biography articles appear to contain infringing verbatim copying from other online sources. Brandt's approach was conservative, but isn't really about the whole of the much larger, but murkier, domain of (non-verbatim) plagiarism.

this entry posted to social/wikipedia;
comments (0)

2007 Mar 29 | The origins, or etymology, of the "Internet"

On the Association of Internet Researchers mailing list Tamara Paradis raised the question of the origins of the term "Internet". I am not sure why, but I am always drawn to questions like this -- perhaps it is my historical sensibility. For some reason, I enjoy going through old documents to find the origins of terms. For instance, in 1999 as a complement to my work on the intersection of law and computer design, I wondered how technicians came to talk about computer proxies and agents, common terms from contract law. So, I researched The Etymology of "Agent" and "Proxy" in Computer Networking Discourse. Similarly, as a complement to work on how informal norms in the form of quotations about the Internet govern, I researched the provenance of many famous quotations such as "Inside every working anarchy, there's an Old Boy Network."

So, I thought I would share some of my notes on the "Internet." Vint Cerf (2000) is fond of talking about how the merging of ARPANET, PRNET, and SATNET were known as the "'inter-net' problem." However, I've not found much documentation of that.

What I have found is that the terms "international", "internet", and "internetwork" were used rather interchangeably throughout the 1970s, they couldn't even settle on what to call it, or what ITP stood for:

In version 3 (1978) because IP was split out of TCP, and was unambiguously referred to as Internet Protocol, I think that's when the term began to stick. However, there's more ambiguity on the details and versioning of these specs, so it's not as easy as that!

this entry posted to social;
comments (0)

2007 Mar 22 | The elusive Jimmy Wales

Like others, I have been surprised that the Wikipedia policies of No Original Research (WP:NOR, Wikipedia 2006nor) and Verifiability (WP:V, Wikipedia 2006v) had been collapsed into a new policy of Attribution (WP:ATT, Wikipedia 2007a). The two former policies, in addition to Neutral Point of View (WP:NPOV, Wikipedia 2006npv), have been essential for understanding and explaining Wikipedia collaborative culture -- even more so than the Trifecta (Wikipedia 2006pt) and The Five Pillars (Wikipedia 2006fp).

But Wikipedia is huge, and it's not hard to miss something even as important as this, so last week I updated my dissertation to read:

The second policy of Attribution requires, in a nutshell, that "All material in Wikipedia must be attributable to a reliable, published source" (Wikipedia 2007a). In a manner of speaking, this second policy is relatively new -- becoming "official" in February 2007 -- because it incorporates and supersedes two the long-standing policies of No Original Research (WP:NOR, Wikipedia 2006nor) and Verifiability (WP:V, Wikipedia 2006v).

Yet, for the dissertation, I concluded that having these two ideas remain distinct was useful to me in my writing and I would continue referring to them even if it now required a parenthetical comment about their merger. Evidently, Wales (2007wvw) agreed, as he wrote yesterday:

The change was made before a sufficient process had taken place to make the change, with the result that many good editors were unaware that such a fundamental change was about to take place. Many have reported being baffled and unhappy with the change.

However, because he intervened with a "rejection of [[WP:ATT]]" (Wales 2007jrw) this prompted two threads of interest to me: to what extent does WP:NOR act as a proxy for Notability, and "Just what is Jimbo's role anyway?" (Bennett 2007). In writing about leadership in Wikipedia and other open content communities, I have wondered the same, and now some were pressing for an explicit enumeration of Jimbo's powers. Others engaged in the perennial question of is this role more like that of a dictatorship, ministership, presidency, or a monarchy? Stephen Bain (2007tdv) has posted a thoughtful argument that constitutional monarchy is the most apt, something Wales himself has said in the past:

But we have retained a 'constitutional monarchy' in our system and the main reason for it is too support and make possible a very open system in which policy is set organically by the community and democratic processes and institutions emerge over a long period of experimentation and consensus-building. No one needs to be afraid that VfD will be hijacked, and our rules turned against us. (Wales2005 nnw1)

And this brings me, finally, to the point of this essay: the elusive Jimmy Wales. I am not sure if this is a feature of other auctorial leaders (e.g., Linus Torvalds, Guido von Rossum, Larry Wall, etc.) but it is a sometimes frustrating and seemingly useful characteristic of Wales who commented on the ambiguity of his role as follows:

I think the limits on my power are quite a bit unknown for a few reasons, mainly that I really don't exercise power all that much, ever, and so most questions of what I could do just simply don't come up. And passing a priori laws against me seems rather injudicious since our community institutions are all quite carefully limited for good reasons in an effort to create an atmosphere of calm loving respect. (Wales 2007jwi1)

This ambiguity is one reason why I find the statist notions somewhat inappropriate and place my theory of leadership in the lineage of emergent leadership (Yoo and Alavi2003).

Wales has followed this strategy from the start, once characterized, to the frustration of Wikipedia cofounder Larry Sanger (2005), as a good cop rarely wielding a big stick:

In retrospect, I wish I had taken Teddy Roosevelt's advice: "Speak softly and carry a big stick. Since my "stick" was very small, I suppose I felt compelled to "speak loudly," which I regret. (This was not such a problem, by the way, on Nupedia; partly, that was because there were not nearly as many problem users on Nupedia, but partly it was because there was clear enforcement authority.) As it turns out, it was Jimmy who spoke softly and carried the big stick; he first exercised "enforcement authority." Since he was relatively silent throughout these controversies, he was the "good cop," and I was the "bad cop": that, in fact, is precisely how he (privately) described our relationship. Eventually, I became sick of this arrangement. Because Jimmy had remained relatively toward the background in the early days of the project, and showed that he was willing to exercise enforcement authority upon occasion, he was never so ripe for attack as I was.

This elliptical approach serves Wales well and I think it is appropriate to the many challenges he faces, but it is also a challenge for writing about Wikipedia. For example, in explaining an inspiration for Wikipedia, Wales (2005nt) has acknowledged the debt to Hayek's, "'On the Use of Knowledge in Society' as a pivotal essay in guiding my own thinking on topics like decentralization, knowledge, and society." But Wales has also resisted (Wales 2005wew) the idea that The Wisdom of Crowds (Surowiecki 2004) is a factor in Wikipedia dynamics. This isn't trivial for me to reconcile and I can only do so by reading the latter statement not as a disavowal of social emergence, but a purposeful shift in focus to the community and culture of Wikipedia. (Since that, too, is my own belief: there are underlying emergent dynamics, but don't forget the collaborative culture!)

Or, consider that Wales (2005wdm) vehemently disagreed with Seigenthaler's claim (AP 2005) that because Wikipedia allows anyone to edit, Wikipedia permits vandalism. Yet, six months later, in order to limit such vandalism Wales argued for what was popularly understood as a new blocking feature, so those logged in from an otherwise blocked IP address could still edit. Wales (Wales 2006nyt1) argued this was not a restriction:

Openness refers not only to the number of people who can edit, but a holistic assessment of the entire process.I like processes that cut out mindless troll vandalism while allowing people of diverse opinions to still edit. Those are much better than full locking.

"Holistic," "elliptical," "elusive"... and a challenge in writing about Wales!

this entry posted to social/wikipedia;
comments (0)

2007 Mar 15 | Reuse vs. self plagiarism

Yesterday's New York Times reported on another case of high profile plagiarism: a relatively young professor who had copied parts of her dissertation from another. Even though she had previously acknowledged as much in private and has now resigned -- so there's no question of ambiguous boundaries -- a few things struck me as salient:

Interestingly, this article came along at the same time I have been following an interesting discussion of turnitin, a student plagiarism detection service, and struggling with issues of "reusing" my own work.

Unlike my previous issue of how to deal with priority in relation to self-published grey literature, my present concern arises out of published work. I am presently working on Chapter 4 of my dissertation which specifies criteria for an "open content community" as well as some interesting boundary cases on openness. A dissertation is, understandably, supposed to be an original work; I read this as "new work since matriculation" but I have heard it said this could mean only unpublished work: this would be horrible. Perhaps my strong view is partly because I'm a "mid-career" Ph.D. student that has already presented papers and it strikes me as contrary to stop a professional activity that is essential for getting feedback. I also appreciate new Ph.D.'s reuse their dissertation in subsequent articles and/or books, which I also plan to do. But to sit on all that material and labor on in solitude -- aside from one's dissertation committee members -- until the dissertation is complete seems counterproductive. Consider the genealogy of parts of the present chapter:

In no case did I assign any copyright -- though they of course are published under various copyright licenses -- and so I am not legally precluded from using them in compilations or derivative works. Making them available has provided me with feedback and opportunities for publication which yields more feedback and builds relationships within my scholarly community. This is great! But what of "self-plagiarism"? (So, perhaps this is like my earlier post but questions of priority and public but "unpublished" work are exchanged for questions of "published" works and self plagiarism.)

I've been reading up on the topic and found Green (2005) interesting, and Hexham (1999) useful:

Self-plagiarism must be distinguished from the recycling of one's work that to a greater or lesser extent everyone does legitimately. Although self-plagiarism in academic publications is a gray area many universities implicitly recognize the practice as fraudulent. Thus most universities have rules preventing students from submitting essentially the same essay for credit in different courses. There are also rules against someone submitting the same thesis to different universities. Among established academics self-plagiarism is a problem when essentially the same article or book is submitted on more than one occasion to gain additional salary increments or for purpose of promotion.

Like all plagiarism, self-plagiarism occurs when the author attempts to deceive the reader. This happens when no indication is given that the work is being recycled or when an effort is made to disguise the original text. The issue once again is one of deception. Disguising a text occurs when an author makes cosmetic changes that make the same book or paper look different when it actually remains unchanged in its central argument. Changing such things as paragraph breaks, capitalization, or the substitution of technical terms in different languages, causes readers to believe they are reading something completely new. If these are the only changes an author has made then they may be legitimately described as self-plagiarism and fraudulent.

The extent of re-cycling is also an indication of self-plagiarism. Academics are expected to republish revised versions of their Ph.D. thesis. They also often develop different aspects of an argument in several papers that require the repetition of certain key passages. This is not self-plagiarism if the complete work develops new insights. It is self-plagiarism if the argument, examples, evidence, and conclusion remain the same in two works that only differ in their appearance.

Which brings me, finally, to my simple and mundane question for my dissertation. Is a citation to my own published works sufficient if I am reusing text -- though continuing to rework and integrate it -- or should I also give an acknowledgment often seen in scholarly books that "portions of this text are republished from or based on...")?

(BTW: a possible irony is I expect this and earlier entries could be turned into a decent paper on "scholarship in the open" should the opportunity ever present itself!)

this entry posted to method;
comments (2)

2007 Mar 05 | Good faith, bad faith

Despite appearances, I do make note of the Wikipedia disaffected. A possible criticism of my work is my focus on the notion of "good faith" in collaboration misses this constituency. No doubt one could write many volumes on the conflict and "dissensus" of Wikipedia. (Deetz (1996) considers the consensus/dissensus cleavage as a common facet in approaches to "Describing Differences in Approaches to Organization Science.") However, I often feel going for conflict is the easier, dramatic, route; it is even privileged. Perhaps to a lesser extent than in psychology (Seligman 2002) social scientists -- and particularly "critical theory" types -- privilege the pathological. When writing of his studies of primate behavior de Wall (1996) noted that "terms related to aggression, violence, and competition never posed the slightest problem" to editors and reviewers. Yet, he "was supposed to switch to dehumanized language as soon as the affectionate aftermath of the fight was the issue" (p. 512). This bias is common, but fortunately in organizational science there is room to consider both good and bad organizational leadership, culture, and structures, with a descriptive, normative, and/or prescriptive stance (Bell, Raiffa and Tversky 1988).

This week a controversy has been raging about misrepresentations a prominent Wikipedian made about his real-life credentials and Wales was criticized for seemingly endorsing this behavior. I planned to say nothing, or at least be patient, as it seemed much of the heat was from those with grudges or posing punditry. However, Wales had finally spoken up -- he had been traveling in India without easy access to an Internet connection -- with the following:

I have asked EssJay to resign his positions of trust within the community. In terms of the full parameters of what happens next, I advise (as usual) that we take a calm, loving, and reasonable approach. From the moment this whole thing became known, EssJay has been contrite and apologetic. People who characterize him as being "proud" of it or "bragging" are badly mistaken.... Wikipedia is built on (among other things) twin pillars of trust and tolerance. The integrity of the project depends on the core community being passionate about quality and integrity, so that we can trust each other. The harmony of our work depends on human understanding and forgiveness of errors. (Wales 2007utj)

I read that and think: this is why I am studying good faith.

In any case, the thread that really caught my interest this week was the banning of a user from the Wikipedia e-mail lists who exemplifies the operation of bad faith in two interesting ways. (I'm not linking to or identifying this person as it's not my intention to attack or to make any judgments about the substance of his complaints.) The first interesting issue is what constitutes a contribution? In goal oriented and merit-based open content communities one's authority in policy/administrative issues is often built upon contributions towards the substantive goals of the community (e.g., writing an encyclopedia). Strident, even if partially sound, criticism from someone who does little else has little weight. When John Doe responded that he did contribute he asked:

What do they consider "not making any positive contributions"? How about this? Or this? Or this?

The fact that these links only pointed to other complaints is telling. Later, when his links to his previous criticism in the email archives broke, he assumed the worst:

Yes, that's right. Someone sort-of-cunningly reindexed the month. ... Why would they do this? Presumably, it's an attempt to hide the post. They know they'd get caught if they just deleted it, but if they just "shift" it a bit... sneaky, boys.

Much as I marvel at Wales' invocation of good faith, I boggle at the paranoia of this statement. As readers of this blog know, I too was frustrated when references to the archive broke -- a common "feature" of mailman -- but to assume it was done to "hide the truth," which is the subject of Doe's entries, is an astounding and ill-founded assumption of bad faith.

this entry posted to social/wikipedia;
comments (0)

2007 Feb 16 | History, Wikinomics, and Causation

An issue related to the question of priority, noted in a previous entry, is the general historical question of causality. Priority, who first had an idea or published it, can be a trivial question relative to a claim about who or what caused something. Niels Bohr modeled the atom, but how did the US come to drop the bomb on Japan?

In the history of technology, the question of "founding" is of particular interest to me. Given the spat between Wikipedia's cofounders on the balance of credit, I note with interest how Wikipedia's "creation story" is told. The new book, Wikinomics (Tapscott and Williams 2006), tells the story as follows:

Wales first ventured into the world of encyclopedic content in 1998, when he established Nupedia with former employee Larry Sanger. Like Wikipedia, Nupedia allowed anyone to submit articles and content. Unlike Wikipedia, it was a centralized, top-down hierarchy: paid academics and topic experts follow the laborious seven-step process to review and approve content. One year and $120,000 into the project, Nupedia had only published twenty-four articles, and Wales decided to scrap it.

One of Wales' employees then introduced him to the Wiki, a concept invented by Ward Cunningham in March 1995, and Wales started again with a much more open way of organizing the site that would allow anyone with the inclination to participate. In the first month, Wikipedia published 200 articles, and in the first year the total reached 18,000 (p.72).

This is the only reference to Sanger in the book, and omits his well-known and public role in Wikipedia's launch in favor of the undocumented claim that Bomis employee Jeremy Rosenfeld proposed a wiki as a way to solve Nupedia's problems before Sanger did. Despite Sanger's strident efforts to defend his claim, I expect that in the popular press he will only be known as the authority-mad academic of Wikipedia's failed predecessor.

(I don't necessarily discount Wales' claim, but one can't deny that the historical documentation shows Sanger played a prominent role in launching Wikipedia. Nor do I believe Sanger was the right leader for Wikipedia; with regards to the original vision and eventual success of Wikipedia, we have Wales to thank.)

this entry posted to social/wikipedia;
comments (0)

2007 Feb 13 | Grey literature, stigmergy and priority

Last week I read a provocative paper by Helen Nissenbaum (2002) where she considers the norms, values, and ends previously served by the convention of scholarly priority, and, now that the contextual landscape is changing because of electronic media, whether intellectual property (patents) can serve just as well in their stead. Helen recommended it to me while we were discussing my dissertation chapter on encyclopedic production, including questions of copyrights and plagiarism. This chapter is partly based on a draft I wrote in 2005 in which I argued the concept of stigmergy is helpful in understanding the sort of socialty involved in the cumulative production of knowledge in reference works.

An irony is that Nissenbaum's paper speaks to the question of scholarly priority in the age of the Internet, which bears on my adoption of the term stigmergy. (She doesn't mention blogs or wikis, but instead refers to "wildcat publishers," "grey literature," and whether there is any scholarly obligation to search these realms for the purposes of citation.)

I think I first wrote of stigmergy in the spring of 2005, in a draft I made available on this blog on September 30. Roughly a year later, I read Mark Elliott's piece Stigmergic Collaboration: The Evolution of Group Work in the May issue of the online MC/ Journal. Elliott explores the idea much more thoroughly than I did or will, and that is good. But how do I deal with the question of priority and citation? I definitely want to -- and do -- cite Elliott in my present version of the chapter, but what to do with my earlier version? I don't know Elliott and assume he knows nothing of me. And I don't feel that proprietary about saying Wikipedia might be stigmergic. And for all I know we read the same thing about wasps -- though I was also inspired by early reference work compilers likening their copying of others' work to a useful "busy bee." But I don't want it to appear I am simply borrowing the idea from elsewhere and I prefer not to cite earlier "unpublished" drafts. This concern with priority is in the face of the biggest irony of all: an argument of this chapter is that knowledge is inherently interdependent and cumulative!

Presently, the text in question reads:

Stigmergy is a term coined by Pierre-Paul Grasse to describe how wasps and termites collectively build complex structures; as Karsai (2004:101) writes, it "describes the situation in which the product of previous work, rather than direct communication among builders, induces [and directs how] the wasps perform additional labor." In addition to my proposal that this notion might be helpful in understanding Wikipedia collaboration (Reagle 2005fss), Mark Elliot (2006) has also, more thoroughly, argued the same: "As stigmergy is a method of communication in which individuals communicate with one another by modifying their local environment… [t]he concept of stigmergy therefore provides an intuitive and easy-to-grasp theory for helping understand how disparate, distributed, ad hoc contributions could lead to the emergence of the largest collaborative enterprises the world has seen" (Elliott 2006:4). However, we need not apply this notion only to new media. For example, stigmergy might also be applicable to Newton’s seemingly generous sentiment of acknowledging the contributions of his predecessors: "If I have seen further [than you and Descartes] it is by standing upon ye shoulders of giants." (As cited in a 1676 letter from Newton to Hooke, by Merton (1993), who details a long history of this aphorism and Newton's probably less than magnanimous intention (Hawking 2002) of insulting Robert Hooke, his short and hunchbacked rival.)

Is this appropriate?

this entry posted to method;
comments (0)

2007 Feb 09 | Auctorial Leadership?

A few days ago, while walking home from the local library, I recalled an expression I learned in a class on early Christian history: primus inter pares. This notion was used by early church leaders (e.g., the Bishop of Rome, now the Pope) and present day patriarchs to indicate a status of "first among equals." Perhaps this could help me with my question of what to call benevolent dictatorship in open content communities. But, the sentiment wasn't quite right and it would be difficult to coin a term out of that Latin expression. But as I followed links from the primus page I encountered the terms "patriarch," "ethnarch," "archons" and finally "auctoritas."

The Oxford Classical Dictionary defines patrum auctoritas as: "the assent given by the 'fathers' (patres) to decisions of the Roman popular assemblies. The nature of this assent is unclear, but it may have been a matter of confirming that the people's decision contained no technical or religious flaws. The 'fathers' in question were probably only the patrician senators, not the whole senate..." (Momigliano and Cornell 2003). Auctoritas is the Latin root of English words authority and author. Given that "benevolent dictators" are often the founding author of open content projects, it seems appropriate. (In the Internet standards context, I spoke of "elders.") While I was convinced for a time my term would include the root "arch," for "ruler," the more I read of auctoritas the more I liked it.

Additionally, the form of power inherent in auctoritas fits my notion of leadership. It is not a coercive order but a recommendation with a normative force based on the prestige and charisma of a leader. Theodore Mommsen wrote of it as a force that is "more than an advice and less than an order: it is an advice whose compliance it is not easy to evade..." (Mommsen, as cited in Lottieri 2005:15). Lottiere's concludes his discussion of the notion by writing:

For all these reasons we can say that auctoritas wa[s] on the edge between the legal world and the social life, the beliefs, the customs. It is in condition to influence the decisions by its prestige. Therefore, people refusing the auctoritas can ignore it, but they know that by the decision they are out of the community. (Lottieri 2005:15).

And this dovetails into the possibility of forking!

So, I find the term to be a surprisingly good fit. Now I need to figure out how to pronounce "auctorical" or "auctorial," or maybe even "authorial," leadership. Is this too awkward?

this entry posted to social/wikipedia;
comments (0)

2007 Feb 08 | ZotZero and BusySponge

I have been reading of ZotZero in Josh's blog and am hopeful that it will help bridge the gap between the dynamic and informal life of the Web (e.g., reading, blogging, bookmarks, RSS, etc.) and the seemingly lifeless task of bibliography. Wouldn't it be nice if citing something was as easy as bookmarking it? Or, if you could read what your colleagues were reading via an RSS feed?

While I haven't played with ZotZero yet -- and I use the Konqueror browser not Firefox -- I share this vision and hope to see it become a reality. And since I recently posted of my Freemind Extract tool (for transforming a mindmap into a bibliography) I realize I haven't spoken of the flipside a couple of years: absorbing information. But first, a historical digression.

The way I make note of and annotate resources and tasks evolved out of two practices at the W3C. The first of which was a decree by Timbl which I objected to strongly at the time: the great datespace shift of 1999. Because the W3C's root file/name space was getting too crowded, Tim's new policy forbid new top-level spaces like www.w3.org/Signature or www.w3.org/Encryption. There were too many already and who were we to lay claim to such spaces for all time? There might be a new digital signature activity 10 years from now, so where would they live? (Consequently, the subsequent key management working group received www.w3.org/2001/XKMS.) I appreciated this concern at the root level, but cringed at only being able to organize other files by date of creation. Try finding a document you wrote a couple of years ago in a space no more structured than /2001/{01,..,12} and is shared by 50+ other people. It's not easy. I realize the only way I could keep track of things I had worked on was to have a log of events and documents I cared about. (This shift also affected how we collaborated in our shared space given issues of ownership, access controls, and version management -- but perhaps more on that another time.)

The second W3C practice was that each of its hosts (worksites) had a weekly meeting at which we shared the important events of the past week and raised agenda issues for common discussion. To make it easier for the minute takers we e-mailed two minutes to an e-mail list and a bot would collect them into draft minutes which would be augmented with the IRC log.

Preparing my two minutes before 10 a.m. Tuesday morning always seemed more frantic than it need be. But, once I started keeping a log of what I had done as a result of the datespace shift, it became trivial. (In fact, I wrote a script to grab the past week automatically, and even generated a RSS feed from the work log so that one could "subscribe" to my work log by keyword/task -- anticipating RSS feeds of tagged bookmarks.)

By 2002 I had tired of manually logging events, via an HTML editor, to my personal blog and work log, so I wrote a specification for a dream tool: Busy Sponge. It would soak up everything I touched of importance and send it to the right place. I opted for a commandline tool I named b.py.

Returning to today, and a challenge I'm sure I share with the ZotZero folks, is how to automatically scrape as much metadata as possible from a Web resource? Busy Sponge continues to be the primary way I input data into my work log and mind maps. Because metadata is no more common or standard on the Web as it was five years ago I am dependent on screen scraping heuristics. For example, the following code allows me to easily capture and cite messages of Wikipedia mailing lists -- and that is why it was such a hassle when the archives broke:

elif url.startswith("http://marc.theaimsgroup.com/"):
	try:
		author = re.search('''From: *(.*?)''', html).group(1)
	except AttributeError:
		author = re.search('''From: *(.*)''', html).group(1)
	author = author.replace(' () ','@').replace(' ! ','.')\
		.replace('&lt;', '<').replace('&gt;', '>')
	author = author.split(' <')[0]
	author = author.replace('"','')

	mlist = re.search('''List: *(.*?)''', html).group(1)

	mdate = re.search('''Date: *(.*?)''', html).group(1)
    ...

Unfortunately, beyond a couple mailing list archives and wikis -- which, fortunately, are the majority of what I grab -- I have to manually edit my sponges with proper meta/bibliographic data. And curses upon those bloggers who make it difficult to determine the author of an article or even the whole blog -- even a pseudonym will do! Beyond the usage of my tool, I can imagine much value in a social tool that allows users to share annotations, or even screen-scraping "plug-ins." One can hope!

this entry posted to technology/python;
comments (0)

2007 Feb 01 | The Anti-Grand Poobah

In a draft on Wikipedia leadership I labeled the "benevolent dictator" form of leadership in open content communities as "paramount leadership." As I turn to revisit the topic in my dissertation I am still not satisfied with this term. A possible tactic would be to adopt the native's term of "benevolent dictator" and use that. The problem, though, with that is that the term is much confused and discussed -- which is why it is interesting -- and would fail to distinguish the distinct theoretical concept I am offering relative to the wider, imprecise, usage. Paramount means "superior to all others" and serves the purpose of being distinct, but doesn't quite capture the meaning of this type of leadership in open content communities. Such leaders often have no formal title but founded the community or otherwise achieved the position through merit. They serve to make decisions that the community has a difficult time making itself. And they must act with humility and humor or the community might fail or fork. Some other ideas I've been bouncing around are "provisional dictator" and "jester king." I need something that is the opposite of "Grand Poobah." Any ideas are welcome!

this entry posted to social/wikipedia;
comments (0)

2007 Jan 30 | Student Usage of Wikipedia

The Blogosphere has been abuzz about a history department's policy restricting students from citing Wikipedia. I'm not fond of this position, as I explained last year, and I thought I'd share the "best practice" I encourage in my students.

this entry posted to career/teaching;
comments (0)

2007 Jan 26 | Freemind Bibliography Extract 0.6

[This entry is now deprecated, please see Thunderdell (Freemind Extract).]

I am releasing version 0.6 of the fe mindmapping bibliographic tools. As explained in Extracting Bibliographies from Freemind, these are python scripts that are able to convert between Freemind mindmaps (using a few simple conventions) and bibliographic formats (i.e., OO.org CSV and bibtex). It also makes it very easy for me to search my notes and quote authors  (e.g., "Giddens"). There are no massive changes, just the usual tweaks and bug fixes. One notable change is the regular expressions in pe.py are much improved, and it's quite uncanny at extracting bibliographic keys of the form 'Snide and Smith (2003)' or '(Snide, Smith and Smittie 2004)' from natural language text.

this entry posted to technology/python;
comments (0)

2007 Jan 25 | Dunc-Tank and Money

Like Biellla, I have been following from afar the controversy [1,2] associated with the dunc-tank project: a way for a few Debian developers to accept donations. The moderate amount of money (appreciated nonetheless I'm sure) caused an extraordinary ruckus among other volunteers, leading to protest and resignations.

How is it that money can be so divisive? The study Biella points to  suggests that even ambient reminders of money (what psychologists call "priming") can lead to "antisocialness." Subtly reminding subjects of money or high affluence (e.g., descrambling a sentence about a high salary, exposure to a poster or screensaver of currency, seeing a big pile of Monopoly Money) led them to be more "self-sufficient," that is less likely to ask or provide help, less likely to donate to a cause, to arrange chairs further apart, and to prefer singular activity to collaboration and social recreation (Vohs, Mead and Goode 2006).

this entry posted to social;
comments (0)

2007 Jan 23 | Broken lists

I'm presently cursing whoever changed the configuration/names of Wikipedia lists. Identifying emails in archives is sadly a difficult problem, it really need not be, but fortunately the good folks at the aimsgroup MARC also archive the lists and associate the unique identifier of every message with a persistent and unique URL, as I wrote about previously. But when Wikipedia moved its lists from "foo@wikimedia.org" to "foo@lists.wikimedia.org" it not only broke email filters across the land, it broke the MARC archives evidently. No message is available in the MARC archive since the change, on January 6. Now, Wikipedians are realizing that many of the links from the Wikis to email messages (e.g., referencing a message on the Wikimedia Foundation list) are broken.

My backlog of email messages to scrutinize is growing as I hope Hank Leininger and the other volunteers at MARC find the time and means to address the problem. What would be great is if Wikipedia and other users of archive software (i.e., mailman) pressed for stable references to messages as a priority feature!

this entry posted to method;
comments (4)

2007 Jan 10 | Understanding technology

I have just completed revising the syllabus for of course I will be again teaching in the spring of 2007: Understanding how we understand: technological predictions, myths, and implications. Pedagogically, the class is very much influenced by Brookfield and Presskill (1999), Discussion as a way of teaching: tools and techniques for democratic classrooms.

this entry posted to career/teaching;
comments (0)

Open Communities, Media, Source, and Standards XML

by Joseph Reagle

powered by pyblosxom


reagle.org

What I'm reading online (blogroll)


Categories

Archives