Open Codex

2009 Dec 18 | Grade Trends

My sense in teaching over the past four years is that I have been assessing higher grades. (I shy from the term "giving grades" as it sounds like a gift based on character or my fondness for the student.) Beyond an anecdotal report on what the department median grade is (for which I appear to be one half letter grade above), I have no other information for the grading distributions in other classes in my department or at NYU, including other sections of the classes I teach. So, my philosophy is to tell students that if everyone performed excellently, that would be accordingly reflected. I then remind students frequently of how I evaluate their work, based on the departmental criteria, and at the beginning of the course provide exemplars of what I consider to be excellent work.

If there is an improvement over the initial semester, this doesn't surprise me in that I feel like my classes are now more honed, with exemplars students have a better sense of my expectations, and I've debugged assignment specifications. I also feel that while the material and assignments in the Media, Technology, and Society (MTS) class are more difficult than Conflict Management (CM), the students are more consistent. So, I performed a five number summary and generated the following box plots (with outliers below 70 truncated):

Grade Boxplots

My conclusion is that while I assessed lower grades in my first semester of teaching each course, there is otherwise no consistent trend. Also, my sense of the MTS students being more consistent in performance is confirmed.

this entry posted to career/teaching;
comments (0)

2009 Dec 07 | News of Wikipedia’s Death Has Been Greatly Exaggerated (Again)

In the past few weeks there's been much discussion of news stories based on Felipe Ortega's dissertation; the concern is that Wikipedians are abandoning the online encyclopedia “in droves.” (What is a drove you ask? According to Wikipedia, it is an ancient route by which livestock were herded.) However, Erik Zachte, with the help of Felipe, shows how in such analysis the way that one constructs one’s parameters significantly affects the conclusions one can draw. For example, the alleged drop-off (deaths) of Wikipedia editors may be more the result of when and how the analysis is done. If you assume that an active Wikipedian is someone who did one edit (i.e., someone who was just experimenting), rather than five, or some other number (i.e., actual Wikipedians), this can significantly affect the outcome. Or, if you assume that a "death" is when someone has not been active for a month, you will naturally have a lot of deaths at the end of the analysis period because these people may have been simply "sleeping" for that month, but come back in the next month and you weren't there to see it. (Like the line from Twin Falls Idaho, a favorite movie of mine, "The sad ending is only because the author stops telling the story. But it still goes on. It's just untold.")

Wikimedia’s lesser noted response to the story claims significant efforts are being made to improve the recruitment and retention of users, but on the numbers side:

On the English Wikipedia, the peak number of active editors (5 edits per month) was 54,510 in March 2007. After a more significant decline by about 25%, it has been stable over the last year at a level of approximately 40,000. (See WikiStats data for the English Wikipedia.) Many other Wikipedia language editions saw a rise in the number of editors in the same time period.

this entry posted to social/wikipedia;
comments (0)

2009 Nov 13 | Wikipedia's new fundraising slogan

Successful open communities must occasionally interact with closed worlds. For example, Wikipedia's openness and transparency sometimes conflict with their obligations to be responsive to the law (e.g., defamation, copyright, and human safety). Such is a consequence of becoming a notable and established institution.

A new source of tension is the "professionalization" of Wikipedia administration -- a move I otherwise commend. It appears professional marketers were asked to develop a marketing/fundraising campaign, yielding the "WIKIPEDIA FOREVER" slogan. Some Wikipedians feel this is inappropriate, arrogant, and loud -- a sentiment with which I agree. A more wiki-typical discussion of appropriate slogans can be found here.

this entry posted to social/wikipedia;
comments (6)

2009 Oct 02 | Gender Bias, Part II

In the previous analysis, of the 174 women from the National Women's History Project, Wikipedia lacked articles on 23 of the women, Britannica missed 65. Hence, I found no support for the idea that gender imbalance in Wikipedians leads to similar imbalance in biographical coverage. However, this did support the (unsurprising) fact that Wikipedia has greater coverage in its number of subjects and article length. Therefore, as noted, on the gender question it would be nice to have a sense of relative proportions.

Consequently, in the second analysis I look at Time's "100" most influential people from 2008. (There are more than 100 subjects because there are a few couples that I break out.)

43 entries are missing from EB; 4 from WP. 4 entries are in neither. For articles existing in both, WP articles are 7.66 times larger on average (median of 6.81).

Of the 105 entries: I guess that 23 are female, 82 are male and 0 are unknown. That is, the ratio of females to males is 0.28. Of the Wikipedia articles, females are 0.29 (23/78) of males; and 0.27 (13/49) at Britannica.

That is, while one might claim that this ratio of 0.28 is evidence of a bias -- on the part of Time or the world at large -- it is a base line from which we can judge the reference works: neither Wikipedia nor Britannica are disproportionately better or worse. If the reference works were biased towards coverage of men, we would expect that ratio to be lower than 0.28 (e.g., if all missed articles were females).

Of course, I'd like to run this over a larger corpus, but in terms of easy to find lists of notable persons, these "100" lists are all I've found so far. Also, I'm relying upon heuristics again to guess the gender of subjects, but they seem to be working well. (EB's Mia-Farrow article is guessed as male because it's actually a stub/sentence in the Woody Allen article.) Finally, an additional feature my approach has is to augment the table with the content from both reference works, but I expect Britannica would not be happy about that so I don't provide that version publicly.

this entry posted to social/wikipedia;
comments (5)

2009 Sep 28 | Fall '09 Update

I have fallen out of the habit of posting updates at the beginning and the end of the semesters. (Mostly because I'm not a student anymore, so I'm not taking new and exciting classes and posting the resulting term papers; instead, I've mostly been focused on the book.) Yet, perhaps it's worthwhile to give it another go.

I've been speaking with a lot of people about Wikipedia, and two such interviews will be up by the end of the day.

On the academic front:

Finally, for those interested in the New York City free culture scene, James Vasile is running a Planet NYC aggregator.

this entry posted to career;
comments (2)

2009 Sep 28 | Sexism and Two World Views

A possible insight applicable to the FOSS and sexism controversies is the incompatibility of two worldviews.

In the first view, one aspires to a post- or blind-"ism" world. Therefore, to highlight differences is discriminatory because it presumes that such differences are somehow essentialist, it recapitulates the very differences from the past that we are seeking to leave behind, and challenges the autonomy and agency of individuals. Egalitarianism and freedom are assumed and in its post-"ism" form it is acceptable to use personal language even when in a prevalent group.

In the second view, one has a responsibility to highlight differences as discrimination is typically masked. Discrimination need not be intentional but can operate as assumptions that need to be aired and challenged. This view acknowledges the continued influence of history and social structure and it expects language to be neutral/inclusive, particularly when used by members of the dominant group.

Perhaps this is applicable to the discussion around Shuttleworth's comment that poorly designed technology makes it difficult to explain to girls. (I actually haven't seen the exact quote yet.) The first world view is seen in Matthew's comment:

He was talking about how hard the design work is to do, and that if things were designed poorly or had low usability, he would not know how to explain them to girls (my translation). The tone of his voice suggested sarcastic embarrassment, which implies he would prefer to impress girls. So he could have said the same thing about his father. Or better yet. If he was gay, he would have said “guys” not “girls”.

The second view can be seen in a number of the responses, including Mary's:

It’s trying to create commonality with the audience around the issue of liking to impress women which is both male-centric and hetero-centric. And it’s sexualising: it reminds women in hearing that they may be (often are) viewed preferentially as an audience for someone’s impressive demonstration (or pickup line) to which we are meant to respond with admiration, rather than as collaborators or teachers.

this entry posted to social;
comments (0)

2009 Sep 25 | Gender Bias in Wikipedia Coverage?

The recent controversy about gender imbalance and sexism in open content communities has been remarkable this summer, and this week's news about Shuttleworth's comments might mean it will extend into the autumn. While I think these events merit a historical and cultural analysis -- and prompts the questions if sexism increased, is it being noticed more, or both? -- I want to postpone that undertaking for the moment. Instead, I wonder if the recent demographic data that shows women are about 13% of Wikipedians affects its topical coverage?

As you can see in this Comparison of Biographies, Wikipedia does very well in its coverage of National Women's History Project (NWHP) biographies relative to Encyclopaedia Britannica.

64 entries are missing from EB; 23 from WP. 23 entries are in neither. For articles existing in both, WP articles are 6.29 times larger on average (median of 4.00).

That is, of 174 women, Wikipedia is missing articles on 23 of them. That's almost a third of those missing from Britannica, which doesn't have any articles not at Wikipedia. When both do have an article, Wikipedia articles have much more content. Of course, those are just the quantitative numbers. Even so, when I browse the actual articles, I am partial to the extra content and images of Wikipedia.

Yet, a difficulty in this work is finding a useful corpus of biographical persons. To say that there are more articles about men than women in any reference work, isn't saying much given world history. So, for this analysis I use those women recognized by the NWHP for Women's History Month. The NWHP is a nice collection in that it has both well known women and lesser-known women who are thought to be notable nonetheless. However, this only tells us that Wikipedia has greater coverage of women than a traditional encyclopedia. (And while this is one of the first large and topical -- rather than quality -- comparisons it should not be all that surprising given Wikipedia's size.) And, Wikipedians are aware of their own systemic bias and make attempts to counter it. For example, those recognized by Black History Month were the focus of a WikiProject that documented every person recognized. (Ironically, this list was taken from Britannica. And perhaps the NWHP list will prompt a similar project at Wikipedia, which is why I use permanent links to the specific versions I analyzed.)

What would really be nice is a source corpus of notable persons, both male and female. I could then compare this against Wikipedia and Britannica to see how they fare relative to the source corpus. That is, a source corpus of 100 people might recognize 75 men and 25 women (25% female), and if one of the references had a 60/15 split, it'd be less "feminine" (20%) than the source. How, then, does each reference work compare to each other, relative to their source? If you have a suggestion for corpus, please leave a comment.

Finally, while speaking with Nora about this, she also raised the question of if the gender ratio of disruptive editors differs from that of the larger community? Our hypothesis is disruptive editors might be disproportionately male. But who can say? Unfortunately, I expect it's difficult to get survey responses from such editors.

this entry posted to social/wikipedia;
comments (17)

2009 Sep 11 | Goldman on Wikipedia's Failure (i.e., "Labor Squeeze")

I just finished reading Eric Goldman's "Wikipedia’s Labor Squeeze and its Consequences" and it is a more reasoned argument than the hyperbolic prediction of Wikipedia's failure. In fact, the claim that there is a tension between openness and protecting against disruption shouldn't be a surprise to anyone that is familiar with online communities. Wikipedia has always had to balance the merits and challenges of openness (i.e., collaboration and disruption). Goldman's paper is a nice treatment of this tension, here's my summary:

The author poses the feature of "free editability" against the need to defend against unproductive contributions. Noting that technological restrictions to date have been "fairly modest", he suggests Flagged Revision features may be a significant change. The plateau of Wikipedian growth is likely caused by editor turnover, an inability to attract and keep new editors, and the lack of incentive mechanisms (e.g., relying only upon intrinsic motivation). The author endorses technological barriers that further constrain "free editability," and the recruitment and maintenance of new contributors, including converting readers into contributors, recruiting cash-motivated individuals, companies, academics, and students to participate.

I have two substantive comments on the paper. First, I am surprised that he even made the failure claim, or that the observation of this tension is novel, given that he quotes a 2005 email by Jimmy Wales. Last week, when I wrote that an open community was not the founding vision of Wikipedia, but a surprisingly productive means, I did not include one of the most compelling -- but later -- messages on that topic. Goldman quotes one sentence from Wales' 2005 message:

Wikipedia is first and foremost an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language. . . .

However, the rest of that paragraph that Goldman doesn't include shows that Wales was purposely highlighting the encyclopedia as the goal, and the community as a means:

. . . Asking whether the community comes before or after this goal is really asking the wrong question: the entire purpose of the community is precisely this goal.

Furthermore, Wales writes:

The community does not come before our task, the community is organized around our task. The difference is simply that decisions ought to always be made not on the grounds of social expediency or popular majority, but in light of the requirements of the job we have set for ourselves. (Wales2005w)

I recommend you read the whole message.

Second, Goldman characterizes Wikipedia as atypical in rejecting contributions from paid/professional content creators. He is conflating the conflict of interest policy with the means of production. Yes, free and open source developers are often paid for their work, and while this hasn't taken off at Wikipedia (the market/incentives are different), I am not aware of any Wikipedia policy that prohibits the adoption of professionally produced content if it is appropriate to the encyclopedia and under a compatible license. However, Wikipedia is rightfully careful about contributors who edit articles about their own financial or reputational interests. This is the difference between incorporating content written by a paid expert on their topic of expertise, and rejecting their edits to their own biography.

So, on this note, what are some examples of content that was produced for pay at the Wikimedia Foundation? I can think of some archival material, such as the use of some material form the 11th edition of Britannica and images now in Commons.

this entry posted to social/wikipedia;
comments (9)

2009 Sep 04 | Some Figures on Wikipedia Protection Mechanisms

The recent focus on Wikipedia "failing" or being "closed" merit some figures and explanation. On the afternoon of Sept 04, 2009 the English Wikipedia has 3,024,063 articles.

The Special:ProtectedPages for the Article namespace tells us:

That's the status quo. Yet, flagging a vetted version of an article has been discussed since 2005. The current widely discussed idea is to conduct a two month experiment in which biographies of living people (402,672 articles, about 13% of the English Wikipedia), or some subset, are "flag protected"; this means anyone can still edit but the public (not Wikipedians) see the last reviewed version. This doesn't necessarily replace the existing protection mechanisms, but could be a good alternative to semi-protection. The experiment will helpfully give guidance on who should be a "Reviewer," and answer the questions of whether this limits disruption, furthers quality, and how long does it takes to review and flag a newer version of an article. Another part of the experiment is "partrolled revisions" which would apply to a wider swath of articles and permit vandalism fighters to bookmark a known good version so they can easily evaluate subsequent contributions, but it won't affect who can edit or what the public sees.

The goal of this, and other features, is to maximize the benefits of open collaboration while minimizing the damage from disruptive edits. In my opinion, this has always been the case and Wikipedia continues to experiment with achieving the best balance.

this entry posted to social/wikipedia;
comments (3)

2009 Sep 01 | Failure and the Vision

Despite my great pleasure and excitement that The MIT Press will be publishing my book next year (I sent the manuscript to the copy editor last week!), stories like this, "Despite changes, Wikipedia will still 'fail within 5 years'", makes me wish I could get it out today. Just when questions about Wikipedia's viability ceased predictions of its demise arose in their place -- and it's getting boring.

Ars Technica journalist Nate Anderson has been profiling law professor Eric Goldman's proclamations of doom for a number of years now. In the book, I touch upon this in a chapter about the critical response to Wikipedia and the way it is produced. My cynical take is that one of the best ways to get attention is to make a provocative claim and then walk it back with some nuanced reasoning once you have that attention. (I'm glad to see Goldman has made such an attempt now with a new article, and hope to read it soon.)

On the substance, I expect I don't disagree much with Goldman, though I would take issue with his hyperbole. In the dissertation and book chapter on openness, I argue that one needs to look carefully at existing context before making pronouncements about the openness or closedness of technology mediated community. So, for example, the introduction of flagged revisions into contentious articles on biographies of living people, might actually increase Wikipedia's openness given that simply "protecting" a page has been a practice for many years now. One needs a good definition or criteria of an open content community if one wants to talk about challenges and change.

However, my greatest agitation arises as a historical one. Anderson concludes his article by writing:

But the preservation of credibility this way comes at a huge cost. First, it means that Wikipedia has failed—at least when it comes to the original utopian idea of an encyclopedia that anyone, anywhere can edit at anytime.

Look at Jimmy Wales first message in 2000 to the list for his new free Web encyclopedia:

My dream is that someday this encyclopedia will be available for just the cost of printing to schoolhouses across the world, including '3rd world' countries that won't be able to afford widespread internet access for years. How many African villages can afford a set of Britannicas? I suppose not many... (Wales2000h)

In 2004, when Wikipedia is picking up, Wales writes:

Our mission is to give freely the sum of the world's knowledge to every single person on the planet in the language of their choice, under a free license, so that they can modify, adapt, reuse, or redistribute it, at will. And, by "every single person on the planet," I mean exactly that, so we have to remember that much of our target audience is not yet able to access the Internet reliably, if at all.... (Wales2004fls)

The Wikimedia Foundation's vision statement reads:

Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. (Foundation2007von)

Nowhere do we see a utopian vision for encyclopedia anyone can edit. A central aspiration in the pursuit of a universal encyclopedia is increased access to and freedom of information: an opening of opportunity and capability to anyone with an interest to learn. Ironically, such an encyclopedia only and unintentionally became possible through a happy accident: universal access to its collaborative production -- which was always tempered. Therefore, we should not confuse the means of Wikipedia production with its mission: a high-quality free and accessible reference work. Therefore, continued experiments in balancing freedom and constraint towards that end are wholly appropriate -- as Shirky argued in his essay "News of Wikipedia's Death Greatly Exaggerated" in 2006.

this entry posted to social/wikipedia;
comments (3)

2009 Aug 21 | Shared Clipboard and Chicago Page Ranges

I use VirtualBox to run a Windows guest, and unfortunately the shared clipboard between the two is sometimes buggy. I recently posted a script for a very robust clipboard using a network shared file. Also, I'm doing the final checks on the book manuscript and the Chicago Manual of Style has odd and confusing rules for specifying ranges of page numbers. The CMS page range validator looks through my sources files and prints out any likely to be counter to Chicago style.

this entry posted to technology/python;
comments (0)

2009 Aug 11 | English Wikipedia's Three Millionth Article

There's a tradition at Wikipedia of predicting when a particular milestone will be reached. Earlier expectations about Wikipedia were laughably conservative. While people have been guessing topics (rather random), there's sadly no page for the three millionth article. Given that since May the English Wikipedia has been increasing at a rate of ~1,300 a day, I'm expecting Wikipedia will hit this milestone in one week!

this entry posted to social/wikipedia;
comments (1)

2009 Jun 30 | Wikipedia Suppressing News

There's been a lot of coverage of the New York Times story "Keeping News of Kidnapping Off Wikipedia." It's prompted discussion about balancing issues of free speech, safety, and responsibility at the Times and Wikipedia. Within Wikipedia, the discussion has only just begun, but has started off quite constructively as seen in Wikipedian Apoc2400's proposed policy: in the short term, Wikipedia should refrain from spreading information if that information is not widely and reliably sourced, of little public interest, and is "likely to have very severe direct negative consequences."

this entry posted to social/wikipedia;
comments (0)

2009 Jun 25 | Our Work After Us

At the beginning of this year, I was sad to learn of the passing of Peter Kollock. He was one of the first to carefully think about cooperation and online communities. I've been citing his 1996 paper "The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace" for a long time now.

Unfortunately, while checking Web references, I discovered the above link to his paper no longer works (i.e., 404). This is the link that appears on his Wikipedia page and dozens of online bibliographies. It appears UCLA yanked his whole web space. The lack of institutional commitment to preserving work and providing stable URIs has always been a great irritation (e.g., see my entry on digital posterity about the links in my dissertation that were soon broken); at the W3C we would frequently talk about this frustration and how to best maintain our own commitment to preservation. And it's not only in death that our work soon disappears. After my time at the Berkman Center, subsequent to a Web site reorganization, I noted all the links to my work there were broken. They were able, and kind enough, to restore the HTML files though my biographical page looks screwy because of broken CSS and relative links -- so I don't even link to that anymore.

In the case of this particular paper by Kollock, it was fortunately published in a book, and I found a PDF version as well -- though I preferred the HTML.

Kollock, P. (1999a). The economies of online cooperation: Gifts and public goods in cyberspace. In Smith, M. and Kollock, P., editors, Communities in Cyberspace. Routledge Press, London. URL http://dlc.dlib.indiana.edu/archive/00002998/

this entry posted to method;
comments (0)

2009 Jun 25 | Anderson and Citing Wikipedia

Chris Anderson's "apparent plagiarism" of Wikipedia has prompted me to post something I was experimenting with last week about citations and URLs. Anderson claims that his text, which is very much like that of some Wikipedia articles, previously quoted and cited Wikipedia as a reference. However, in discussions with his publisher, there was some uncertainty about how to treat URLs (since Web pages might change) and Wikipedia (since it is collaboratively authored). Hence, he attempted a "write-though" for the "case of source material without an individual author to credit (as in the case of Wikipedia)." This is obviously problematic and Wikipedia, on every article, gives guidance on how it can be cited, including the use of a permanent link to a specific version.

However, I can sympathize with the ugliness of long URLs and "last accessed" requirements. Since I began work on my Wikipedia manuscript an aspiration has been to create a work in which the vast majority of historical and ethnographic sources are readily accessible to the reader. This means I have a lot of references. So, as I give thought to the book in print and online form, I wonder how to strike the best balance. I've moved on from the dissertation's APA author-year towards Chicago Manual of Style notes format. Yet, I noticed that notes with URLs can get rather ugly. Particularly if one has more than one citation in a note. (Otherwise it looks like a law review paper.) My notes only implementation of Chicago, where the first reference is a full citation and subsequent references are short but include the oldid since I make use of different versions of the same article, is below. Imagine pages of this stuff, it's not easy to read:

  1. Wikipedia, "Wikipedia:Neutral Point of View," Wikimedia, September 16, 2004, http://en.wikipedia.org/w/index.php?title=Wikipedia: Neutral point of view & oldid = 6042007 (accessed March 5, 2004); Wikipedia, "Wikipedia:Neutral Point of View," Wikimedia, November 3, 2008, http://en.wikipedia.org/w/index.php?title=Wikipedia: Neutral point of view&oldid=249390830 (accessed November 3, 2008).

    ...

  2. Wikipedia, "Wikipedia:Neutral Point of View (oldid=249390830)."

In the context of the Chicago notes variants, I've made the following experiment in my manuscript:

  1. Long (end) notes upon first instance (including URL) and subsequent short notes (with version number noted in title of Wikipedia pages, such as in note 63 above) subsequently yields 396 pages.
  2. Exclusively short (end) notes followed by bibliography with full citation (including URL) yields 452 pages.

Option 2 is more readable, but requires another redirection by the reader if they want full bibliographic detail, and adds pages (and weight and cost) to a book. Another option is to use an adaptation of Option 1: standard long-then-short Chicago without URLs in the printed book, which are provided online. This make a practical sort of sense (and this is what Anderson says he was planning to do), but is non-standard and I'm not sure how it would be received.

However, this difficulty doesn't mean that one should simply "write through" one's sources (whatever that means) and remove the attributions all together.

this entry posted to method;
comments (1)

2009 Jun 11 | The Informed Analysis of New Media

I recently finished two works about the "free culture" movement, each of which are polar opposites -- and in a way that is unsettling. The most recent is Mark Helprin's Digital Barbarism: a Writer's Manifesto. I have long found it ironic that critics of "Web 2.0" -- to use a problematic term for this larger new media phenomenon -- end up adopting the evils they attribute to their subjects: visceral, from the hip, slapdash. Lawrence Lessig excoriates Helprin in a review so I need not waste any words here; even so, I continue to be surprised at what passes for informed criticism. On the other hand, David Bollier's Viral Spiral: How the Commoners Built a Digital Republic of Their Own is an excellent history of the Creative Commons and Free Culture movement.

However, am I only praising those works that are congruent with my sympathies? While Bollier is not presenting criticism (pro or con), it is a favorable portrayal. But I don't think I'm being unfair. I consider myself allergic to unalloyed "Net boosterism" and the "Boing Boing" crowd. In my work on Wikipedia, I admit that I am fond of it but I try to take a "Neutral Point of View" as a scholar and an intellectual hobby. By this I mean that beyond academic concerns, I personally enjoy learning about different perspectives and trying to understand how people come to differing opinions. (So I'm identifying as a "skeptic" more so than an academic.) In fact, I was delighted to read Mark Bauerlein's The Dumbest Generation: How the Digital Age Is Stupefied as Young Americans and Jeopardizes Our Future: or, Don't Trust Anyone under 30. While it sounds like another rant, it is a well-founded critique of how digital media is damaging literacy and civic preparedness in youth. He argues that while screen-based technology might further spatial cognitive skills, knowledge is being replaced with a narcissistic preoccupation with social peers and popular culture. And he actually makes logical arguments based on citations to research. One doesn't have to agree with his argument, but it deserves one's full consideration.

This is why I was disappointed a few semesters ago when I recommended Bauerlein to an otherwise excellent student who was a Net enthusiast. She treated Bauerlein as if he were a Keen or Helprin, cursorily brushing him off as someone who didn't "get it." This was counter to the spirit I was trying to inculcate in that class and began my musing on whether we have a genuinely informed and vital discourse.

this entry posted to method;
comments (2)

2009 Jun 10 | Wiki-Conference New York, July 25-26

This year's picnic will be better than ever, as we'll have an unconference to get us started:

The 1st Wiki-Conference New York will be held over the weekend of July 25-26 2009 (confirmed!) at New York University, and hosted by Free Culture @ NYU and Wikimedia New York City.

Sign up on the wiki, propose a lightning talk or breakout topic, or round up some Wikimedians for a panel discussion.

this entry posted to social/wikipedia;
comments (0)

2009 Jun 08 | Institutions vs. Norms

In Noam Cohen's recent New York Times article about "The Wars of Words on Wikipedia's Outskirts" (i.e., the recent ArbCom decision about Scientology edit wars) I note that organizations often develop towards bureaucratic forms (citing Max Weber) but even in their more free-form states communities still have structure, even if informal and implicit (citing Jo Freeman). I believe this means that while we might enjoy the informal and personal touch of working within a small community, if it is successful, that community will likely move towards more bureaucratic forms. Also, this can also have some benefits if the informal/implicit structures were unsavory. (As Mitch Kapor wryly noted, "Inside every working anarchy, there's an Old Boy Network.") As I said to Noam, rather than lament the passing of the good old days, I think it better to ask how to address issues in the present (including the maintenance of earlier values). (And actually, while it has slipped a bit from its original mission/intention, I think the ArbCom is doing a good job.)

Richard James asks if this sentiment is contrary to my focus on informal social norms, particularly in my blog entry about "Morality and the Dilemma" (i.e., Olson, Ostrom, and Hardin). Also, am I not abusing notions of "technical solutions" with institutional governance? To be clear, Wikipedia production might be explained by any number of approaches including: technical features, institutional governance, and social norms. In trying to complete my dissertation, I had lengthy, and sometimes stressful, arguments about to what extent one of these is more important than any other. Granted, all of these are important and to deny otherwise is silly. However, I found the initial focus upon technical features in accounts of FOSS/Wikipedia to be insufficient, and therefore offered a complementary social/cultural account of Wikipedia in response. But I'm not excused from trying to understand how each of these things interrelate and affect one another. My argument is that informal "good faith" social norms (supported by wiki features) are good at dealing with good faith participants, but more formal and autocratic forms of authority are often necessary to deal with those of bad faith or to make decisions as a last resort when no community consensus emerges -- hence the existence of Benevolent Dictators in open content communities. If such leadership or institutional governance persistently fails, the community might then fork.

this entry posted to social/wikipedia;
comments (0)

2009 May 21 | Extrapolating to 100,000 Featured Articles

I recently noted there were some new numbers on the 100,000 feature-quality articles page. In May 2008 (based on a January assessment I believe) there were 2,421 featured articles. Today, based on a February 2009 assessment, there are 2,570. That's a 6% increase -- below the 24% growth rate to 2.7 million total articles. If we assume a similar rate of increase, it would take 62 years to reach the goal of 100,000 articles.

initial = 2570; target = 100000; growth = .06;
years = (log(target)-log(initial))/log(1+growth)

If we relax the goal to have 100,000 good or better articles, that will require 24 years at a 16% growth starting with 11,024 "good" articles. Of course, I don't know to what extent the rate of growth is increasing or decreasing.

this entry posted to social/wikipedia;
comments (0)

2009 May 18 | Morality and the Dilemma

The challenge at the heart of collective action is how cooperative behavior emerges when there are apparent reasons for it not to. This is famously demonstrated by the Prisoner's Dilemma in which two co-suspects have compelling cause to defect -- turn informer -- against the other but the consequent of both following such a strategy is worse than had they cooperated and remained silent (Axelrod 1984). That it, if your partner remains silent, you will get six months in jail if you are also silent, but you go free by defecting and saddling your partner with a ten year sentence. If your partner informs on you, and you do the same, you each receive five years unless you're the sucker and get ten. Defecting is the dominant "equilibrium" state regardless of your partner's choice: going free is preferable to six months; five years is preferable to ten. So both players defect, get five year sentences, and wish they had remained silent and gotten off with six months. The dilemma is that the individual's dominant strategy also creates a mutually suboptimal result; in this case, fear of the worst-case scenario inhibits beneficial collective action. Understanding the distance between the lack of cooperation implied by the dominant strategy and the mutual benefits of cooperation has been a central concern of social science since Garrett Hardin's (1968) article "The Tragedy of the Commons." In this scenario, the dominant strategy of a herder is to put as many animals as possible on common land, despite the fact that if everyone were to do the same it would soon be overgrazed. A few years before, in 1965, Mancur Olson (1971) published a book by which he characterized this type of problem as "The Logic of Collective Action."

Olson, considering production rather than consumption, asks who would contribute to a common public good when they might just as easily defect and "free ride"? Yet, again, should everyone follow this reasoning, no public goods will be produced. Olson provides an extensive taxonomy of group characteristics that affect this logic, including their size and interdependence, the market's demand elasticity, the balance of costs and benefits, and the ability for a group to exclude or penalize those who fail to contribute. (Ultimately, "trust" becomes a central element in such group dynamics and might arise in the context of time and reputation, institutional controls, or group norms.)

Around the same time, Robert Trivers (1971) characterized a related problem in animal behavior. In his article "The Evolution of Reciprocal Altruism," he defined an "altruistic situation" as one in which "one individual can dispense a benefit to a second greater than the cost of the act to himself" (Trivers 1971) and modeled the conditions under which altruistic behaviors were likely to emerge. (Like Olson, these relate to the character and extent of social interaction.) Of course, as noted by Frans de Waal (2008), "a return-benefits calculation typically remains beyond the animals cognitive horizon" and altruism itself is likely the result of a more proximate evolved behavior: empathy. (This link between empathy and altruism is hypothesized, outside of the evolutionary context, by Daniel Batson (1991).)

Recently, these two threads of political economy and evolution have been combined in the work of Elinor Ostrom. In "Governing the Commons" she makes a slight digression away from a macro-political perspective to note that "communities of individuals have relied on institutions resembling neither the state nor the market to govern some resource systems with reasonable degrees of success over long periods of time" (Ostrom1990gce). By studying such institutions she recommends that the dilemma of "common pool resources" might be addressed by eight institutional design principles: clearly defined boundaries, congruence between appropriation/provision rules and local conditions, collective-choice arrangements, monitoring, graduated sanctions, conflict-resolution mechanisms, state recognition of groups' right to self-organize, and the nesting of enterprises in large systems.

More recently, Ostrom makes greater use of the evolutionary approach to focus on the emergence of norms (Ostrom 2000). She takes issue with Olson's (1971) earlier claim that unless the group is small, or there is a way to force individuals to act in their common interest, "rational self-interested individuals will not act to achieve their common or group interests." She characterizes this as Olson's "zero contribution thesis" and notes that it contradicts everyday experience; the problem of free riding exists, but community governance regimes do emerge and persist (Ostrom 2000). While it might be "irrational" from the egoist perspective, a significant proportion of people will act cooperatively (i.e., 40-60% of people will initially contribute to the public good in a finite-round game). This cooperation is affected by factors such as expectations about others, and the framing and number of interactions between peers. And, in keeping with Olson, people will expend resources to punish those who make below average contributions. Hence Ostrom characterizes norms as those values (e.g., reciprocity, fairness, and trustworthiness) that affect the preference for cooperation. If there is a sufficient proportion of "norm using" players (i.e., conditional cooperators and willing punishers), this "creates an opening for collective action" (Ostrom 2000). This is especially so if there is good information about the trustworthiness of one's peers. If cooperation has been successfully established, new members will likely be appropriately acculturated. Hence, collective action and their supportive social norms can emerge in an evolutionary context: the gap of the cooperative dilemma can be bridged. Indeed, Olson recommends her eight institutional mechanisms (or "principles") to further such outcomes.

Recently, a number of scholars have applied this literature on collective action to Wikipedia. Johnson (2007) uses Ostrom to characterize vandalism and point-of-view (POV) pushing as collective action problems. Viegas, Wattenberg, and Mckeon (2007) argue that Wikipedia's Featured Article process reflects Ostrom's first four principles of locality, collective choice (participation), monitoring (accountability) and conflict resolution. Andrea Forte and Amy Buckman (2008) use all eight of Ostrom's design principles to evaluate Wikipedia governance and its Biography of Living Persons policy; they argue that there is decentralized policy creation, interpretation (i.e., its Arbitration Committee) and enforcement (i.e., administrators) but conclude the biggest lack relative to Ostrom is the uneven enforcement of policy.

However, these works tend to remain focused at an institutional level, focusing on community mechanisms for content and membership policy. (Two exceptions are a quantitative analysis of patterns in Wikipedian references to policies and guidelines from discussion pages (Beschastnikh, Kriplean, and Mcdonald 2008) and a characterization of the type of "utterances" used on Discussion pages (Goldspink 2009).) If, following Ostrom, we can think of norms as those values (e.g., reciprocity, fairness, and trustworthiness) that affect the preference for cooperation, can we find and characterize such norms in Wikipedia culture? I believe we can, and this is the focus of my work on Wikipedia.

Might we even characterize prosocial norms as a form of morality, in the sense employed by Bowles and Gintis (1998)? Indeed, despite preceding theorists of collective action by almost two centuries, Kant's (2005) categorical imperative is a moral response to the collective action dilemma: "I ought never to act in such a way that I couldn't also will that the maxim on which I act should be a universal law." Coincidently, the lesser well known subtitle to Hardin's famous "Tragedy of the Commons" article is "the population problem has no technical solution; it requires an extension in morality." Therefore, I do not think it is a stretch to conclude that Wikipedia collaboration is as much a "moral" problem as a technical one.

this entry posted to social/wikipedia;
comments (3)

2009 May 11 | Making Word Useful

Because I use speech recognition software (SR) I'm forced to tangle with proprietary software and formats; this provides a continuous reminder of the benefits and joys of Free Software. However, I have learned a few things about maintaining a Windows system for SR over the past five years.

In 2004 I began using continuous SR with ViaVoice on a headless Shuttle box accessed over VNC. (This was a big improvement over the discrete speech system I used 10 years before.) Despite the ameliorative provided by imaging the OS partition (PING is great for this), Windows was still a dreadful thing to maintain; the advent of virtualization has been a blessing. And up until the beginning of this year, I relied upon Win2K so as to keep a lean and portable OS. However, security and software support for Win2K is ending and the excellent VirtualBox 2.* software permits one to emulate a consistent hardware profile (including the bios); this allows me to placate XP's annoying validation system.

I presently use NaturallySpeaking 10.1. While the underlying recognition is often remarkable, the user experience and Nuance's support are dreadful. To have useful macro support one must pay hundreds of dollars more for a "professional" version to a company that charges its users for tech support because of its own breakage, which, if reported as bugs, are ignored. Fortunately, there is a friendly FOSS community and DragonFly is an amazing (Python-based) macro application that helps me get around the worst annoyances in NaturallySpeaking.

Then there is the matter of application support. While coders might be content with Emacs or UltraEdit, I dictate prose and want a visually meaningful processor: paragraph/heading styles, a spelling and grammar checker, word counter, etc. Lyx, Amaya, OpenOffice, and Abiword are not "Select-and-Say" capable applications (i.e., not useful with NaturallySpeaking). That leaves Microsoft Word and its loathsome ".doc" binary format. These binary files are impervious to the more useful features of versioning systems, or simple scripting. If I need to fix the capitalization of a term in my manuscript, I have to manually open each chapter and do "find/replace" rather than fix it with a simple one-line command (or with KFileReplace). While I had some hope the new ".docx" format would be useful (it is easy enough to unzip and parse) making sense of it is an outrageously difficult task (particularly lists). So, for years now I've been writing pseudo-LaTeX in doc files, converting them to text via antiword and processing it from there.

However, I recently accumulated enough Microsoft Word hacks to turn it into a decent text editor.

  1. Set the default save format as plain text and its default font to something nice like Andale Mono.
  2. Bind {control-v} to this PasteUnformattedText() macro.
  3. Bind {control-s} to this FileSave() macro to get rid of the annoying "you will lose your formatting saving to text" dialog.
  4. Office XP doesn't use UTF-8 encoding by default and nags you with a dialog every time you open such a file. UTF-8 is the encoding used by every other sensible application of late. Make it the default with this registry edit, but realize it uses the byte-order-mark (BOM) which even otherwise sensible applications get confused by. When processing the text, you can remove it in Python with: line = line.lstrip(unicode(codecs.BOM_UTF8, "utf8")).
  5. You can even "syntax highlight" your text with VBA: this AutoOpen() macro shows how editing markdown and LaTeX visually looks much like what I was seen before, but it is now an open format UTF-8 encoded text file!

this entry posted to technology;
comments (0)

2009 May 05 | A Google Group Gripe

In the past few months I have received invitations to join varied Google Groups. While they are no doubt easy to set up, the (ironic) thing these groups had in common was a focus on free culture (e.g., FOSS and Wikipedia). However, I have not been able to learn how to subscribe to these groups. Instead, I have to log in using a GMail identity. So not only do we have echoes of Microsoft's presumptive ubiquity (if you don't have their software, you are not welcome to participate), Google has access to both your browsing history and your private email?! I am not a Google-hater, but I am concerned about my privacy and proprietary lock-in. The majority of my web browsing is done in Konqueror, and I don't accept any cookies from Google. I also filter email to GMail via a procmail recipe for when I'm out and about, but this is occasional, rare, and on public machines that I don't spend significant time on. (If I need to use a Google service, I pull up Firefox with cookies enabled.) In a literal sense, those people that use Google for searches, email, and calendaring are like Alice in Wonderland, having eaten a magic cookie from Google that reveals all. (I suppose if I must, I will have to create another Gmail identity for subscribing to these lists exclusively; I can then fetch those messages via the POP service that Google kindly provides.)

On another (miscellaneous) privacy related note, this was certainly an odd conflict: Mozilla Ponders Policy Change after Firefox Extension Battle

this entry posted to social;
comments (2)

2009 May 01 | Wikipedia: the happy accident

A brief historical essay is now available at ACM Interactions (or on my site):

"Wikipedia was an accident." I sometimes offer this (admittedly) exaggerated claim in response to those who confuse Wikipedia's current success with its uncertain origins. At the start, it was but the most recent contender in an age-old pursuit of a universal encyclopedia: a dream that the latest technology would provide universal access to world knowledge. Jimmy Wales's and Larry Sanger's first attempt at what would eventually become Wikipedia, the wiki-based encyclopedia that "anyone can edit," was neither of these things. So, by saying that Wikipedia was an accident, I don't mean it was unwelcome—far from it—but that it was a fortuitous turn of events unforeseen by even its founders. Moreover, it was evidence of contingency's role in technological innovation. ...

this entry posted to social/wikipedia;
comments (1)

2009 Mar 24 | Sci-Fi Visions of Technology

The season finale of a television series can leave me in one of three states: (1) what a sad/pitiful ending to something that was once great (I rarely see these because I'll abandon the series once they "jump the shark"); (2) what an awesome conclusion to a beloved show (e.g., The Last Air Bender); and (3) "meh." The finale of Battlestar Galactica left me in the latter category, leading me to forget why I cared all this while. Also, the complete repudiation of technology -- sending the whole fleet to burn up in the sun so as to settle on their new planet with little more than the clothes on their backs -- irks me.

To make any sense of the Battlestar Galactica finale I must tell myself that it can only be understood symbolically: that the universe is enmeshed in perplexing veins of mystery, sometimes manifested as "angels." And that by abandoning technology and scuttling the fleet they are attempting to break the historical cycle of man/machine violence. Now, I am not an unabashed supporter of technology. Technology can be used for amplifying good or ill, and I'm not sure where I stand on assessing the balance. And personally I believe I am the most content when living simply and mindfully, a state best achieved when removed from the addictive agitation and anxiety of the wired life. Why, as Emily Gould admits to, do we feel compelled to check e-mail late at night, or as the first waking act of the day? Therefore, I tried to be mindful of my use of gadgets. Similarly, a friend recently told me how his new Blackberry has massively improved his ability to schedule clients, but he could no longer use his phone as his alarm. He found it too tempting to check e-mail in bed, and banished the Blackberry from the bedroom and purchased an actual alarm clock.

But on Galactica they were going primitive. It is implied that they will integrate with the pre-linguistic Homo sapiens the already populate the planet. It is remarkable that the whole fleet agreed they would cast off technology. What of medicine and the basics of food production? Perhaps they will face a plague, or, more likely, bring one to the natives of the planet. Do they know how to farm and hunt? The ships could at least have been used to make plowshares and arrowheads. And what of their culture? These people are the last carriers of a galactic civilization: terabytes of art, music, philosophy, and history sent into the sun.

Granted, humans frequently abuse technology to horrible ends, but by abandoning technology all-together I expect that within a few years they'd be living a cave man's life governed by the technology of the club and the logic of brutal survival. And, I don't see how they have done anything to prevent the historical progression and periods of slavery, colonialism, despots, and global war. Civil society, the rule of law, equality, and surviving child birth and childhood should not be cast aside so lightly because they came with such great costs.

Coincidentally, I characterize my own take on technology by way of a sci-fi philosophy quite different from Ronald Moore's religious and Luddite tendencies. I'm closer to being "Bakuian" in reference to a recent -- and otherwise not very good -- Star Trek movie, one in keeping with the skeptical modernity of Gene Roddenberry. In that universe, the Ba'ku people fled the violence and technology obsessed galaxy for a simple agrarian/artisanal life.

However, it isn't as if they abandoned technology completely, but mastered it, or rather mastered the human impulse to abuse it. This is seen when Capt. Picard meets with Ba'ku leaders about an apparently violent outburst from Data, an otherwise benevolent and wise android:

PICARD: The artificial lifeform is a member of my crew. Apparently, he was taken ill.

TOURNEL: There was a phase variance in his positronic matrix which we were unable to repair.

PICARD: [puzzled]

ANIJ: I think the Captain finds it hard to believe that we'd have any skills for repairing positronic devices.

SOJEF: Our technological abilities are not apparent because we have chosen not to employ them in our daily lives. We believe when you create a machine to do the work of a man, you take something away from the man.

The Baku learned to use technology appropriately. (Not surprisingly, while I sometimes feel alienated from gadgetry, I'm very keen on the "appropriate technology" movement.) Of course, we don't know how they did this. These people are the descendents of defectors form a high-technology society. And one of the plot revelations is that those attempting to oppress them now, the Sona, are in fact the Ba'ku children who got fed up with simple living, went off to explore the universe, and turned into creepy (skin transplant and tightening) technology fetishists.

Of course, all of this is fiction. But ultimately, the problem with the military, agricultural, or consumerist "industrial complex" is not a technical one, but a human one -- just as is not keeping the Blackberry in the bedroom. I don't think this problem is solved by simply regressing and abandoning all that one might have (hopefully) learned.

this entry posted to social;
comments (0)

2009 Mar 13 | BusySponge 0.5

In 2002 I began thinking about how to best capture and share the many web-pages and small tasks of the day. I thought of it as a "busy sponge": logging bookmarks and tasks to my team page with a minimum of typing. Furthermore, I wanted to tag each entry with a keyword which could then be used in queries. I posted an implementation in 2003 which was complemented by the fact that the tasks on my team page were syndicated (via RSS) -- and used to generate my "two minute reports" at the weekly staff meeting at MIT. This was a number of years before the notion of micro-blogging became popular.

Two interesting features have further matured: I wanted it to fetch the title of a URL -- typing HTML was a hassle -- and I wanted to tell it which of my pages to log the entry to: my personal weblog or work team page. For example:

urd:/home/reagle > b http://pesto.redgecko.org/dispatch.html j python Noted ^

is a sponge of a URL to my "j" (work) page where "^" becomes the hypertextual page title, resulting in:

<li class="event" id="e090313-f7fd">090313: python] <a href="http://pesto.redgecko.org/dispatch.html">Noted URL dispatch &mdash; Pesto: a library for WSGI applications</a></li>

With BusySponge 0.5 (now distributed as part of Thunderdell), this has matured into a set of classes for webpage screen scraping and a set of logger functions. So, for example, I might sponge a comment about a URL and indicate it should log it to my bibliographic mindmap (Thunderdell) and it will do its best to fetch the page author, title, date, publisher, permanent link, excerpt of first substantive paragraph, etc. The default heuristics do a surprisingly decent job -- certainly better than typing it from scratch -- and the specific scrapers (e.g., Wikipedia, MARC email archives) are quite good.

this entry posted to technology/python;
comments (0)

2009 Mar 03 | Magnus and Sanger on Expertise

In my previous entry I commented on one of the articles in the Episteme Wikipedia issue. I thought it would also share my comments on the other two articles that were of interest to me. I read both of these under the influence of Collins and Evans (2008, hereafter "CE:"), which I have also mentioned here.

First, Larry Sanger's piece on The Fate of Expertise after Wikipedia is composed in two parts. First, the author responds to various interpretations of what he calls "The Wikipedia Potential Thesis" (WTP) whereby if Wikipedia fulfills its highest potential in terms of measurable quality, "experts would not need to be granted positions of special authority in order for humanity to have a resource that accurately tracks expert opinion." I think this is a bizarre thesis that no one has actually put forward. After some philosophizing, and given that Wikipedia is dependent upon expert (CE:contributory) knowledge, Sanger concludes this thesis is untenable. I agree. While Wikipedia might be sufficient in providing EC:interactional expertise (knowledge of -- not ability to do -- science) and might threaten other interactional experts (i.e., journalists) it would not obviate EC:contributory expertise. He also argues that Wikipedia is successful not because of anonymity, but because of its freedom -- permitting him to claim Citizendium is just as wiki-like and powerful as Wikipedia, but better in that real-name identities support community, governance, and quality. This is an argument he's made before, and one I largely agree with. Had Wikipedia started with the requirement that people login with an identity that corresponds to some real-world identity -- and this only need be policed in cases of abuse -- I think it would've done just fine.

Second, I most enjoyed P.D. Magnus' On Trusting Wikipedia. After reviewing literature on the reliability of Wikipedia, and arguing that Wikipedia is not like Britannica, the author posits five means by which reliability might be ascertained. The first three means correspond to types of meta-expertise in Evans and Collins: authority (reliable source; EC:local discrimination), plausibility of style (EC:technical connoisseurship), and plausibility of content (EC:ubiquitous discrimination). The second two have no direct corresponding type in Evans and Collins: calibration (testing a subset of the authors claims), and sampling (testing single claim with another expert, i.e., a second opinion). The author concludes that in the case of Wikipedia, none of these indicators are particularly strong. But I find his fault with authority (i.e., check your sources implied by WP:Verifiability) rather weak; he argues sources are unreliable, as are Wikipedia articles, since they are dynamic and can change. That is why one should use the permanent link (dated and versioned) when referring to something on the Web.

this entry posted to social/wikipedia;
comments (0)

2009 Feb 28 | Wray and the Wrong Tree

I have to agree with Sage Ross on his response to Brad Wray's The Epistemic Cultures of Science and Wikipedia: a Comparison. Wray is right to note that there are differences between scientific knowledge production and Wikipedia production in terms of the knowledge produced, who produces it, and the process. However, Wray's article does not show any cognizance of the actual epistemic basis of Wikipedia: not a word about Neutral Point of View, No Original Research, and Verifiability. Instead, he uses Adam Smith's invisible hand metaphor to argue that if local concern about one's scientific reputation and career yields a global value in the production of knowledge, this cannot be claimed for Wikipedia because no one has a scientific reputation at stake. First, the invisible hand argument is not the only theory for understanding peer-production. Two, as Ross notes scientific reputation is not the only motive that might be operational under the invisible hand model -- many Wikipedians are very much concerned about their peers' opinions. Wray writes "We have very little reason to believe that an invisible hand is at work, ensuring that the truth, and only the truth, is made available" (p. 43). Smith's hand can apply to more than scientific reputation and "truth"!? That's simply barking up the wrong tree.

this entry posted to social/wikipedia;
comments (0)

2009 Feb 12 | Wikipedia's Final Days

In the conclusion of my dissertation I note how in 2004 a disaffected Wikipedian told me the project had gone downhill. Ironically, a few years later another Wikipedian who began their career in 2004 looked upon that year as the golden age from which Wikipedia had declined. Nostalgia is a fascinating phenomenon in human memory and history. Therefore, it's not surprising to find news stories year after year, since the Seigenthaler incident of 2005, speaking of Wikipedia's doom. These stories are often prompted by an embarrassing vandalism case or a competitor who claims to have righted all that is wrong in Wikipedia. This is yet another instance in a larger history of failed predictions about technologically related phenomena.

Even so, the past few weeks seems particularly pessimistic.

I am concerned about the brittleness that results from the tension of being open to both newcomers and attack. Yet, it also seems unavoidable as Wikipedia became more prominent; I don't think this issue will sink Wikipedia, and hope it is amendable to continued good faith discussion and hard work. I don't subscribe to the perpetual growth theory that seems to be the presumption of many of the participants of Wikipedia Weekly -- and the world markets prior to a year ago. I think Wikipedia will survive even though/if the number of contributors levels off and flag revisions are enabled. The latter feature might prompt a flurry of stories about how Wikipedia is over, but it might stem the flow of future stories about embarrassing vandalism. Wikipedia won't be the same a couple of years from now as it was a couple of years ago, but nothing ever is.

this entry posted to social/wikipedia;
comments (0)

2009 Jan 23 | Wikipedia Relicensing Transition

A discussion on wikiEN-l about yet another Wikipedia alternative prompted me to wonder about the status of the GNU Free Documentation (GFDL 1.3) to Creative Commons (CC-BY-SA 3.0) license transition -- its FAQ is handy too. Because voting for the proposal is supposed to happen in two weeks, I thought I might as well make my considered decision now.

In general, I think it is a great idea: GFDL is inappropriate for a number of reasons, and this will further the flow of content between Wikipedia and other projects. I have two hesitations, if I understand the proposal correctly.

First, the dual licensing provision (where all Wikipedia developed content continues to be available under the GFDL, but imported CC-BY-SA content is not) is complex as it places an obligation upon the user of content to investigate if CC-BY-SA-only content was ever used; the FAQ recommends such information be placed in "the article footer or the version history." It would be useful to me to see a couple "screw cases" and their implications. For example, for a user who makes use of Wikipedia content -- including their own derivations -- what happens if they fail to note it is only CC-BY-SA-only content? (What if it was indicated but they fail to notice, or that it was not indicated but is learned of later?) However, evidently this is a compromise necessary for the sensitivities of the parties involved, and I don't think this will massively impact anyone.

Second, one of the benefits of CC-BY-SA is that attribution is specified by the licensor, which can be a URL from which the content was obtained -- instead of listing dozens of authors. But Wikimedia wants an exception: for articles with less than six contributors, those contributors must be listed. This just seems like a hassle. It may be moot, if the percentage of Wikipedia content with less than six contributors is near zero, but if not, I think this will be a headache for those wanting to make use of Wikipedia content. I presume one would count unique IP addresses (for anonymous contributors) and log-in names without concern whether these might be the same people. Even so, the proposal is talking about name attribution for less than six contributors, and by "reference to an online copy of the history page" for more. It is not as if the authors of content with less than six authors are greater auters. Why is this distinction even meaningful? And, the history page is easily accessed from the actual content, which would be referenced anyway, so now we have two URLs for no reason. Whereas one might have easily scraped a selection of Wikipedia (e.g., wget) for printing in developing countries or including it on a mobile gadget, one now needs an application to count contributors and include superfluous URLs.

I'm generally happy and excited about the transition, but I wish the attribution was simplified.

this entry posted to social/wikipedia;
comments (0)

2009 Jan 22 | Rethinking Expertise

Given my interest in Wikipedia, pseudo-science, and skepticism I'm fond of works which at least help us identify the implicit (social) concepts we invoke when we talk about knowledge, authority, and expertise. Evans and Collins (2007) Rethinking Expertise is an interesting treatment of the topic: well-written (though more explicit definitions of the terms would be useful), engaging (via examples from the literature on the sociology of science), and satisfying (solidifying some of the things I've been thinking myself).

In order to best understand the terms of their "Periodic Table of Expertises", I reproduce it in my (mindmap) notes with hypertext where appropriate. I thought I would share it here too. For those interested, but not yet convinced, there are a number of reviews [1,2], the one by Michael Lynch and a response from the authors are evidence of some of the theoretical differences in the Sociology of Science (i.e., the distance between the "ground" of actual practice and analytic categories).

this entry posted to method;
comments (0)

2009 Jan 22 | Thunderdell 1.0 (was: 'FreeMind Extract')

I'm releasing the latest set of Freemind Bibliographic Extraction scripts. I'm calling it a "1.0" release because:

  1. I decided to give it a funny name.
  2. I now address an unlikely but long-time screw case.
  3. This and other cases are now tested by doc_tests.
  4. I updated the generation of bibliographic keys to remove 'and' from the author portion and always include a title suffix -- instead of just when there is a collision. Keys are now a bit more terse and more stable.
  5. I now emit biblatex, a much more complete and powerful bibliographic format.

It also now has its own webpage, from which you can download it.

this entry posted to technology/python;
comments (0)

Open Communities, Media, Source, and Standards XML

by Joseph Reagle

powered by pyblosxom


reagle.org

What I'm reading online (blogroll)


Categories

Archives