Open Codex

2005 Dec 15 | Nature's Wikipedia and Encyclopedia Britannica Analysis

Those interested in Wikipedia are discussing the comparison of errors appearing in a sample of articles, reported in by Nature, of 42 article. While I agree with Jakob Voss's comments on the limitations of the study, for this sample the amount of errors does seem roughly comparable with Wikipedia -- hopefully that Wikipedia outlier for Dmitri Mendeleyev will be fixed soon. I was further intrigued to note that the errors per topic correlate between the two:

WP v EB

This is a strong correlation (r=0.574) implying perhaps a similarity in the difficulty of writing on that topic, or perhaps a difference in scrutiny by the experts (e.g., the person reviewing the Cambrian explosion is picky!).

this entry posted to culture/wikipedia;
comments (1)

2005 Dec 13 | Wikipedia History Scraping

To confirm the power law in Wikipedia edits (many doing a little, a few doing much) this regular expression and Python code parses a Wikipedia history fairly well:

history_regex = r""".*?oldid=(\d+).*(\d\d:\d\d.*?\d\d\d\d)</a>.*<span class='history-user'>.*?>(.*?)</a>.*(?:<span class='comment'>(.*?)</span>)?</li>"""
regex_obj = re.compile(history_regex)

url = sys.argv[1]
html = getHTML(url)
lines = html.split('\n')
for line in lines:
    if line.startswith("<li>(<a"):
        counter = counter+1
        match_obj = regex_obj.search(line)
        if match_obj:
            oldid,date,author,comment = match_obj.groups()
            edits.setdefault(author,[]).append((oldid,date,author,comment))
counts = [(author,len(edits[author])) for author in edits.keys()]
counts_s = sorted(counts, reverse=True, key=operator.itemgetter(1))
print counter
for author,number in counts_s:
    print author, ";", number

this entry posted to method;
comments (2)

2005 Nov 14 | Godwin's Law

On October 4, 2005, I had the good fortune to meet Mike Godwin at the ITS Colloquium. Godwin is famous for his adage that as the duration of a USENET discussion grows, so does the probability of a comparison with Hitler or Nazis. During the seminar the topic of Wikipedia arose and Mike, sitting two seats away, nudged me and said he had an interesting story to tell.

Godwin's Law is now quite old, few use the USENET for discussion, but the observation remains potent because while Godwin spoke to a feature of human discourse, that though exaggerated on discussion groups, transcends a particular media. Indeed, Senator Rick Santorum started a controversy with just such a comparison this summer.

Godwin notes that his observation was penned specifically as a memetic experiment: to pose an idea and see how it perpetuates and mutates in the field of popular discourse. The law has been fecund, leading to variants and malapropisms. Ironically, when someone unknowingly uses one of these variants she might be attacked by a dogmatic defender of the orthodoxy, provoking allusions to fascist language Nazis, thus proving the adage.

In any case, the Wikipedia experience that Godwin wished to share was about the article on Godwin's Law. While modifying the article to more accurately reflect the history of the meme, some other editors objected. The trinity of Wikipedia policies is that editors should be neutral in their presentation of claims, not include original -- and potentially crackpot -- research, and provide citations such that any such claim can be verified by others. So, this story brings us to the interesting question of how does the primary source, such as Godwin, edit a related article? While recognizing Godwin's authority, one might also then challenge his neutrality and reporting of primary claims. It is not uncommon for contributors to create "vanity" edits (pages or links) that are rebuffed with these policies when the edit is not of encyclopedic merit. But what of when the edit is of merit? Are the most qualified primary sources disqualified from editing the Wikipedia article? Need a primary source published her first person claim elsewhere before it can bear upon the Wikipedia article?

this entry posted to culture/wikipedia;
comments (1)

2005 Nov 02 | Can you trust the Wikipedia?

In the past week the perennial question of "Can you trust the Wikipedia?" arose while I was working on the tedious -- though oddly compelling for an obsessive like myself -- task of reviewing the early period of Wikipedia history. I slowly worked through the Wikipedia timeline ensuring each event was dated and sourced. I realize that if I'm ever to trust this timeline, I need more than a bald claim. And, my appreciation is so much greater when I can peruse the primary source. For some sources, such as the Nupedia list archives, I was able to find copies of messages on the Internet Archive. Another source, Jimbo's explanation about Stallman's proposal for a competing project, is seemingly lost forever. Fortunately, Stallman was kind enough to tell me of his recollection of the incident and allow me to publish it. Most frustratingly, I encountered a tantalizing mention of Internet encyclopedia proposals from the UN's Millennium Project but failed to find any source or corroboration; that information is stricken from the article. Which brings me back to the question of trusting the Wikipedia. I have addressed the broader question of epistemological authority before, but now I want to focus on the role of sources.

Simply, Wikipedia is only as trustworthy as its links. Actual scholarly authority is similar. A critical part of scholarly training is learning why and how to cite (link to) others. Expert authority is also generated from experience in the field, and theoretical and methodological training. Yet, as I've noted many times "'We can never know everything.' We all can't be experts on everything, so we often need to rely upon credible authority while remaining critical and skeptical, but never dismissive." Consequently, the tokens "Ph.D." and "professor" become proxies for an assessment of trust that very few people are able to substantively test, but, to which many are willing to defer. Because Wikipedia lacks such reputation mechanisms Wikipedia is, again, only as trustworthy as its links. For educational purposes, the implication of this is profound. Should we teach students to trust a claim because it was simply uttered by a credentialed person? Or, should we encourage them to click a link and teach them how to investigate for themselves?

The consequent of this for Wikipedia culture is that it doesn't link enough. Perhaps my experience with Wikipedia history is exceptional since Wikipedians take the sources for granted. But, as I found, that's a poor historical assumption. I also share the concern that articles might become overly busy or dense with citations. There is a tension here, but one I think the technology can handle. It's why I believe the trustworthiness of Wikipedia is in part dependent upon the citation project and furthering a culture of "if you claim, you cite" as implied by the Verifiability policy.

this entry posted to culture/wikipedia;
comments (2)

2005 Oct 27 | Zimmerman's theory of history

My understanding is that Zimmerman thinks that good history makes a compelling argument about humans in time. It might be compelling in that it is a story told well, and, most importantly, it casts light upon bigger historical themes. There is a relationship between the specificity of the project and the generality of its historical context. For example, to describe the Wikipedia's coverage of 9/11 is just that: descriptive. But a historical argument should also say something more. (Otherwise, the description might only be of interest to the narrowest archivist.) What does the coverage of 9/11 tell us about the event, about new media, or even the development of the Wikipedia? The specific historical research and argument needs to relate to the general -- and often taken for granted -- themes: to augment, support, or counter. However, one must also guard against presentism, to draw connections between the past and present. This often will be compelling, and sometimes useful, but it can also turn into a fishing expedition in order to justify how the author feels about the present, instead of a deepening of our understanding of the past.

[Later] In my own draft paper, I did not address the question of what is different about the Wikipedia vision, and that of, say, H. G. Wells.Or, what does HG Wells tells about the Wikipedia? What is new and novel with the Wiki, why did Wikis happen when they did? What are the new assumptions brought by use of the Wiki? None of the projects I looked at were used to describe the cultural space from which they came. It can be useful to pose this question to oneself: what does the Wikipedia tell a future historian about our present time?

this entry posted to method;
comments (0)

2005 Oct 17 | Ethnography and History

What is the difference between (sociological) ethnography and history? In taking a methodological course in each of these disciplines this semester I've been attempting to find an answer to the question; I offer my current, imperfect, understanding.

Simply, the ethnographer is present to the social phenomenon of interest whereas the historian has some remove in time and place. Each then has a different predominant focus on the question of subjectivity. Ethnographers tend to think about their own position and biases relative to their environment, and historians are concerned about their relationship to their sources. However, a reflective practitioner of each method appreciates the subjectivity of herself and the object of study. Whether it is a discussion in the present (predominantly ethnography), a recollection of the past (oral history and ethnography), or records of the past (predominantly history), each is shaped by the social environs.

Another possible difference is that while history is often content with the particular, sociology reaches for a transcendent theory. This is not to say sociology has no concern with "thick description," nor that history has no thesis -- it is an argument about humans in time -- but that their primary aspiration and style differ. Whereas sociological theory creates, or is the result of, a distance by the researcher, time often does the same for the historian permitting a triangulation (of many sources) whereas thte ethnographer often looks for contemporary comparison. (Comparison with the past is often called the "ethnographic revisit.")

In addition, one might then ask how journalism and anthropology fit into this mix!

this entry posted to method;
comments (5)

2005 Oct 08 | Plagiarism and primary sources

Week eight: what accounts for the recent spate of scandal surrounding "facts, fictions, and fraud" in American historical scholarship? How might knowledge of these episodes affect or alter the way you pursue your own scholarship?

In "Past Imperfect" Hoffer (2004) offers a number of arguments as to why there have been so many incidents of alleged fraud in the historical profession: the aspirations of authors to write to the popular market, the almost industrial system -- employing many assistants -- with which books are researched and authored, the demands by publishers for more books, the popular audiences' demand for entertainment rather than scholarship, the eagerness of the ideological opponents to take these authors down a notch, and an inability for the profession to police itself.

While authors such as Michael Belleselis, who falsified data in order to argue that gun ownership was not common in early American history, deserve rebuke and sanction, I felt sympathetic to some of the authors (i.e. Stephen Ambrose) who got into trouble for borrowing primary source quotations and tweaking the surrounding secondary material and presenting it as his own with a citation to the secondary source. Was the problem:

However, regardless of how one might answer any of these questions, my concern is with the veil that shrouds practical discussion about these issues. For example, I feel quite vulnerable in discussing my usage of Google and Amazon above. I feel as if in raising these questions I'm making myself suspect. It is as if everyone assumes we know exactly what is right, when in fact there are many gray areas. (And the topic of plagiarism is not the only one shrouded in this mist, there are gray areas around how we make use of copies of copyrighted works in the academy, as well as how to finesse "human subjects" review board bureaucracy.) Typically, these issues, when abuse is finally caught, are addressed by overly strident, sometimes impractical, norms and bureaucratic procedures. Instead, they should be addressed at the out-set as challenges that all researchers must grapple with, and with which we can feel free to share our strategies and concerns.

this entry posted to method;
comments (0)

2005 Sep 30 | Results as of Fall 2005

I'm now in my third year at NYU; the first and second year exams are done, and after this semester I will have satisfied my course requirements. (This semester I'm taking methodological courses including ethnography, history, and statistics.) The outstanding item, then, will be the completion and approval of my proposal -- which will also include finding a third member for my committee.

The majority of my efforts are focused on the Wikipedia; some recent drafts that may be of interest on that note include:

this entry posted to career/phd;
comments (2)

2005 Sep 28 | Writing Ethnography

The Professional-Lurker mentions Tales of the field: on writing ethnography (Van Maanen 1998), which I highly recommend for "practicum" reading. I'm presently reading another good book in the same series Writing ethnographic fieldnotes(Emerson, Fretz and Shaw 1995).

this entry posted to method;
comments (1)

2005 Sep 12 | Cranking out the words

In addition to the books I read when I first arrived at graduate school I've encountered a few other highly useful guides that encourage the words to hit the page:

  1. Tara Gray's 12 Steps to becoming a more prolific scholar
  2. Michael Froomkin's legal writing tips
  3. Blog: Academic coach

this entry posted to career;
comments (0)

2005 Sep 09 | Coleman, History v. Ethnography

Yesterday I read a chapter from Gabriella Coleman's dissertation on the Debian community. (She co-authored a piece with Mako in the M/C Journal issue in which I wrote about open content communities.) I was quite excited to read it because it reminded me of another paper had just read a couple days ago by Tom Chance -- sharing themes of hacker culture -- but more importantly because it includes many the same questions I have about my own community, the Wikipedia. My inspiration and template of sorts has been Michael Sheeran's study of the Quaker community. However, that dissertation is nearly 30 years old and has no theoretical or methodological text. (And while I hope to not linger on those things in my own dissertation, I have to have a sense of them in order to write it!) In particular, both Sheeran and Coleman conduct a combination of history and ethnography.

In my case, I've been wondering if my project is:

It appears the Coleman did a mix of both. When referring to the "public history," she names names where appropriate; but, developer interviews are anonymous. However, there are some oddnesses which I find confusing. For example when she speaks of the "Vancouver incident" she quotes an e-mail, without citing its source. It's a well known email, and not citing it directly strikes me as odd. I'd love to see her methodology secion and IRB proposal. In fact, I hope to chat with her soon. (Additionally, it appears she's studied religion and I wonder if that would be relevant to my own interests in sectarian decision making.)

While reading the dissertation, I made this little table comparing Debian and Wikipedia:


Debian (Coleman 2005) Wikipedia (Reagle 2005)
charter social charter "an Encyclopedia"
policy Constitution NPOV?
final arbiter technical committee arbitration committee/Jimbo
leadership Democratic/meritocratic Jimbo and meritocratic lieutenants
socialization new maintainer sponsor mentorship radically open
"witnessing" one's biography and explanation of the policies in one's own words user pages (or people describe how they came to and what Wikipedia means)
decision making contributed, discursive, and voting discursive, persistent, and occasional voting (deletion)

this entry posted to method;
comments (1)

2005 Sep 09 | Online resources in BibTex

In BibTex, how does one represent the paragraph number for a citation in online -- non-paginated -- articles? Some journals request that citations indicate the paragraph number of the online resource. I do not know how to represent that in BibTex: I know of no tags for paragraph numbers in the BibTex format, nor do I think most LaTex tools/styles know how to represent them in a citation...?

this entry posted to method;
comments (0)

2005 Jun 27 | Mailman, Message-id, and Persisten URIs

As someone who is interested in studying and citing email conversations, the opacity of the mailman interface -- or the lack of my understanding -- is a pain. I was spoiled by the W3C's system where each email had a header with a URL to its place in the archive, which corresponded in some way to the msg-id! When processing comments on a spec, or citing conversations, it's very handy to be able to link to a persistent Web representation of an email.

In writing about Wikipedia discourse I'm stuck with using the message-id if I happen to have that email in a mbox, or a URL if I happen to have a Web page, but from one I can not easily get the other, and I'm not confident that the URL will be stable in any case. (For example, will this always correspond to the message with the message-id "42BEC0EF.6070906@web.de"?)

Without a guarantee of stability, I suppose its best to use msg-id in citing WP discourse, but that makes finding that message problematic for the reader. I'd provide a hint if I could somehow obtain it myself, but the HTML page for a message in the archive has no indicatation of the msg-id. And even if I have the msg-id, I can't easily find the corresponding archive URL. Before sending this message, I thought there would be a search interface and I could write a script, but there doesn't appear to be one, and it doesn't work in Google (e.g., this query returns nothing).

What to do?? Fortunately, http://marc.theaimsgroup.com/ provides a Web archive to these lists with the ability to query based on message-id. The following procmail script will add a header and signature containing a URL of the message:

###########################################################################

# insert X-Archived-At header into messages from Wikimedia lists

# :0fwh will append the "Archived at" at the beginning of the message

:0

* ^List-Id: .*Wiki[mp]edia.org

{

MID=`formail -xMessage-Id | tr -d '<' | tr -d '>' | tr -d ' '`

URL="http://marc.theaimsgroup.com/?i=${MID}"

:0fw

| (formail -I"X-Archived-At: ${URL}"; echo "Archived at: ${URL}")

}

(Thanks to Hank Leininger for responding to my concern that marc URLs contained the excluded character delimiters '<' and '>' meaning they were not a valid URL and weren't automatically clickable in some applications such as KMail. The next day it was fixed and I am happy!)

this entry posted to method;
comments (0)

2005 Jun 10 | Mindmapping Bibliographies

I am releasing a new zipfile of the fe mindmapping bibliographic tools. As explained in Extracting Bibliographies from Freemind, these are python scripts that are able to convert between Freemind mindmaps (using a few simple conventions) and bibliographic formats (i.e., OO.org CSV and bibtex). This approach is preferable to other bibliographic tools with limited/constrained forms for text entry. With fe one has a complete outline/map of texts, with figures, images, tables, links to sites, etc.; one can easily organize texts by topic or in separate mindmap files; and one can generate queries where each matching line has its appropriate citation with year and page number (e.g., "Giddens"). Unlike many bibliographic tools, it does not query on-line databases, but one can use such tools (e.g., tellico or refworks) to query and generate bibtex bibliographies and then use be.py to convert them to a mindmap.

this entry posted to technology/python;
comments (0)

2005 Jun 10 | Wikipedia and Astroturfing

Clay Shirky notes a cycle of references created by a few people with the effect of promotion: a Wikipedia article for Symphony OS is referenced in a Slashdot article, which is then noted in the Wikipedia article:

This is an interesting kind of spam, or maybe we could call it a reputation hack.... They create a Wikipedia page, point to it as if to demonstrate independent interest for the project in their potential slashdot post, then point to the slashdot effect on the Wikipedia page as proof of said independent interest. Voila, an instant trend.

The Symphony Talk page reminds me very much of one of the Lamest Edit Wars Ever over very similar issues with SkyOS: "Fast & furious kindergarten catfight with accusations of GPL violations, advertising, lying and fanboyism." One difference is that Symphony is actually Free Software, so while there is an argument about advertising, implied dishonesty and fanboyism, the GPL hasn't been an issue -- yet! (Accusaion of GPL violation sometimes strikes me as similar in some sense to Godwin's Law; while license violations may be a substantive accusation, the discourse has no doubt gotten heated by then.)

Shirky also thinks that referencing the consequent Slashdot Effect on the Symphony OS site doesn't merit inclusion. (Personally, I don't mind and I don't read the Slashdot Effect as a reciprocative authority.) After Shirky removed it, EliasAlucard reverts the removal commenting "Why is trivia being removed by that anon user 'Clay Shirky?' As far as I'm concerned, he has nothing but distaste for this article, and his edits shouldn't be reckoned with." Unfortunately, here and on the Symphony Talk page EliasAlucard is not representing himself -- nor the article -- well and is failing to uphold numerous Wikipedia norms of good faith and writing for the enemy. (In Wikipedia, we encourage folks to try to see the perspective of the other, not write them off.) Also, the Slashdot effect claim is without  attribution and citation of evidence. So I've included that link at least.


this entry posted to culture/wikipedia;
comments (0)

2005 May 30 | Pruning

The Professional Lurker notes an article by the FreeRange Librarian which identifies the important role of deleting/removing material of dubious quality. This function, too, exists in the Wikipedia: Votes for Deletion. Otherwise, this argument simply reduces to the one of authority which the Librarian has raised in the past. Authoring, editing, deleting, and moving -- and soon article validation by user feedback -- all exist in the Wikipedia. Some, such as the Librarian and Sanger simply want to highlight or make use of experts' abilities. This is difficult when all users -- including the jerky clueless -- have the same standing and victory is more often achieved by dogged verbosity. So, there is inevitably frustration on the part of some. Such as it is. The Wikipedia is a working experiment on this note. (And if you want to look into the petri dish, see the discussion of Wales' credentials idea.)

this entry posted to culture/wikipedia;
comments (0)

2005 May 16 | NYU's Information Services Suck

An example of how frustrating reliance on bad technology can be consider the following. NYU publishes a course catalog for the subsequent semester. However, NYU often does not publish enough such that I get a copy. The standard rejoinder is to use the web site. While I would object in principle, I think NYU should have a paper copy available for me, I certainly object on the basis of utility: NYU's information technology resources are extremely difficult to use. Trying to find and register for courses is a nightmare.

To get a listing of courses, and register, one has to select various options in a JavaScript pop-up menu, which is very sensitive to losing focus. So for example I want to figure out what courses are available. Under the "registration" option there are among other options: registration status, course status, register, registration schedule. The appropriate selection is "course status," though it is always the last one I think of. Having selected that, I'm presented with a number of forms fields whereby I can "search" for a course. A course listing typically looks like:

E38.2008 (12345) - Sem Media Criticism II SEM 4.0 .

In order to perform a search, I must know the course subject (of which there are 300 entries in a pulldown menu!) and the course number (2008) or course level (graduate). The course subjects are listed alphabetically, so if I only know the token "E38" -- which is how we refer to our Media Ecology classes -- I have to scroll through the menu of 300 entries or, as I "prefer," open a HML source view of the document and search for the "subject" corresponding to E38. The sad irony here, is that many of the 300 entries are actually duplicated and only differ with respect to whether they are graduate and undergraduate subjects. If this is the case, then why would I need to specify the "course level"? In any case, what is unfortunate is that:

1. It is not possible to search for courses based on words that appear in their title or description. So, for example there's no chance of me being able to search for courses with "social" or "technology" appearing in their description or title.

2. Should I actually find of course, and want to register for it, I cannot click on it to do so. I have to write down the call number, go back up into the confusing JavaScript pulldown menus, and fill in the registration form.

Now, the really annoying thing about all of this, is that you're not allowed to use the back button. If you do so, at some point you will likely encounter the error: "Your access to the system is denied because of improper authorization. Return to the login page to re-enter the required user authentication data." This means I lose my browsing context and have to go to the login page re-login! This happens on Windows and Linux, in the Mozilla, Konqueror, and Internet Explorer browsers. ("Not using the back button" seems a wholly inappropriate recommendation.) And in general, the site violates numerous accessibility and usability guidelines that I won't even bother to detail here.

If one was lucky enough to get a paper course listing, one should just punch those course call numbers into the registration form. Otherwise, don't expect to find any courses for which you don't already know the details. Information technology, in this case, is no improvement upon, or even replacement for, the printed catalog.

On the good news front, I may finally be able to use SSH. By policy NYU blocks all ports except Web, and some FTP access. So, I cannot access my home or web host from school. After negotiating the appropriate levels of bureaucracy, including having a form signed by my advisor and the financial dean of my school, I now have a shell account, which I think will permit me to SSH out of NYU!

this entry posted to career;
comments (0)

2005 Apr 30 | Epistemic stances

When it comes to making claims, about reality or anything else really, a number of different stances might be adopted by the speaker:

this entry posted to culture;
comments (0)

2005 Apr 16 | Encrypted Files Systems

In moving to Kubuntu 5.04 from a Knoppix install, my loop-aes partitions are no longer readable. Since crypto-loop is being deprecated anyway, I thought I would try the dm-crypt. However, because I would have to employ that on top of a file loop, it's a hassle. Fortunately, I bumped into EncFS. Generally, I like it a lot and is comparable to crypto-loop except when it comes to a USB drive. A copy of a 2GB file to a

Ouch!

Interestingly an ext3 formatted loop device (no encryption) on the  external drive is ~17MB/s and  with crypto-loop it is ~10MB/s. Now, here's the real kicker:  an ext3 loop partition sitting on the vfat external drive, hosting an encfs directory is ~14MB/s! So, vfat sucks -- though encfs on vfat aprobably doesn't have to do quite so poorly. Or to put it another way, it's faster to put a 3GB file on the external vfat drive (vfat is very compatible with many computers), mount it as an ext3 device loop, and run encfs on top of that than it is to access the plain old vfat file system. (This is even slightly faster than running encfs on the local IDE drive!)

this entry posted to technology;
comments (0)

2005 Apr 04 | Lost Minorities

In a discussion on the possibility of limiting anonymous edits to the Wikipedia -- resulting from the mad dash of unwelcome activity during April Fool's day -- a participant replied "This one comes up again and again, and the consensus has always been 'nope, anon edits will continue thanks.'" I am always cautious before making such a declaration of consensus because of what I call the "lost minority." When a particular policy of the community is likely to drive away some participants one must be careful. I explain this by way of a group within a room of which some folks have bad gas. Those with the gastronomical distress, of course, don't complain. There are some, who aren't bothered by the smell, or think that the smell is a worthwhile trade-off for the presence of the farters. And those with sensitive noses simply can't take it. When the latter group propose that something be done about this issue, such as limiting the amount of broccoli that is served at lunch, or developing a nose filter, people discuss the issue, and perhaps it gets added to an agenda. In time, the group is polled, and surprisingly the consensus is to do nothing. But, what is happened is that a significant portion of those with the sensitive noses have already left.

I don't know if this is the case in this Wikipedia example, but this concern is relevant to issues that are likely to drive participants away. And in time, that lost minority might become an alienated majority.

this entry posted to culture;
comments (1)

2005 Mar 31 | Early cooperative norms

Just ten years after the first electronic computer was built, and two years after the release of IBM's first mainframe, a collaborative association was formed by which IBM customers could collaborate, called Share. The following is reminiscent of the many the Wikipedia dictates on keeping a humble and open mind:

They asked the originator of each program to assume responsibility for making and distributing prompt corrections. In a statement titled "Share Membership and What It Entails," adopted in February 1956 at the fourth general meeting (table 2), all representatives were reminded: "The principal obligation of a member is to have a cooperative spirit. It is expected that each member approach each discussion with an open mind, and, having respect for the competence of other members, be willing to accept the opinions of others more frequently than he insists on his own." (Akera 2001:719)

this entry posted to culture;
comments (0)

2005 Mar 11 | Mythbusters and Buttered Toast

A segment on tonight's MythBusters addresed the question of "whether buttered toast falls buttered side up or down more often?" This is one of my favorite daily puzzles that can be addressed by a basic understanding of experimentation and statistics. My own curiosity on this question was satisfied by a segment of Newton's Apple -- if my memory is correct -- which found that it is the typical height of the table surface which determines the, originally, upward facing side falling on the floor. Pushing the toast from a ladder completely reversed this trend as the toast could tumble a full 360° and land in its original orientation: buttered side up.

Yet, notice the MythBusters question: it asks if toast being buttered effects how it ends up -- regardless of its original orientation, even if that is buttered up in most all daily cases. So first they had to find a way to drop toast in an unbiased way independent of the original orientation. Not surprisingly, Adam found that pushing it from the table was not satisfactory on this note. Eventually they developed a machine that dropped unbuttered toast landing up 11 times, and down 13 times -- orientation was determined by a magic marker X which we must assume is unbiasing. It is reasonable to conclude that 11 up and 13 down is indicative of a "fair" mechanism. Now when they buttered a side of 24 slices of toast they also found 12 up and 12 down. These sample sizes are too small, but roughly, it does not appear that the butter had any effect!

However, when they drop the toast from a two-story building (27'5") and find that the dry toast side X lands up 26 out of 48 drops (54%) and the buttered side X lands up 29 out of 48 drops (60%), Jamie posits that the 6% discrepancy is because he could see that the buttered side had a concave impression, and like a leaf, the convex non-buttered side tended to fall face down. Adam concludes, "if you really want to ensure, in general, you're toast landing buttered side up or down, we can tell you, you should butter with a good vigor and that the resultant bowl will make your toast generally fall butter side up." However, though he "generally" qualified his statement, strictly speaking, it is not statistically supported and when Jamie is offering a mechanism for a perceived statistical finding, he is premature. (However, if he is offering a simple observation, that's all it is.)

In this case, the null hypothesis is that the difference between the dry 54% and the buttered 60% is just due to chance. (Or, if we were to repeat the experiment, it's probable that a similar skew would happen.) The alternate theory is that there is some causal mechanism (i.e. the bowl shaped impression) that affects the outcome. If we can show that there is a low probability of repeating the experiment and observing a similar significance of difference (6%), that implies support for the alternative hypothesis. Unfortunately, neither test alone is statistically significant. For example, the probability of getting 29 out of 48 drops buttered side up even on a fair coin is 8.5 %.

z = (observed - expected) / StandardError
z = (29 - 24) / Sqrt(48)*Sqrt(.5*.5) = 1.445
=> P = 8.5%

The random chance of getting 26 buttered side up his 27%.

The probability that the difference between getting 26 in the "dry" control case, and 29 in the buttered case also is 27% and not significant.

z = (observed - expected) / StandardErrorofDifference
z= ((60%-54%) - 0%) / Sqrt((SEdry)^2 + (SEbuttered)^2)
z=6% / Sqrt(7.19^2 + 7.07^2)% = .5950
=> P = 27%

this entry posted to technology;
comments (0)

2005 Mar 03 | Relational artifacts

In a small seminar Sherry Turkle spoke of her work on relational artifacts: devices that present as having a mind of their own so as to inspire an emotional and nurturing response in humans. What are the societal implications when children become dependent on them, or we use them to keep the folks in nursing homes happy? Is machine presence preferable to loneliness? Yes, it might be better if we talked to children about the bumpiness of human friendships (e.g. learning to share) and the cycle of life (e.g. a biological pet's death), or provide real companionship, support, and community to elders, but absent that, are robots okay?

Turkle refuses to answer this question as stated and challenges the assumptions as a matter of principle -- she also recommends that machines acts like machines, but what this means is also problematic.

Yet, if we do attempt to answer the question directly, it reduces to the problem of the "happy box." If Tyrell Corporation could manufacture a box to which one could plug-in and become contentful, should we permit it? This is a massively complex and interesting question, the subject of much science fiction and anime stories, and even encountered in the question of should Buddha have taken Prozac?

In the end, as in anything, I suppose it comes down to the personal choice of consenting individuals and society should maintain a commitment to make the real world a worthy alternative to simulated bliss.

this entry posted to culture;
comments (0)

2005 Mar 01 | Usage and citation

There are two oddities associated with concerns about the Wikipedia in the school setting, when teachers do not permit the students to use Wikipedia as a source.

The first difficulty is that the Wikipedia is often compared to the Encyclopedia Britannica, a resource I was not allowed to cite in highschool, though it is now presented as the gold standard of authority relative to the messiness of the Wikipedia. The second is that students will use the Wikipedia -- just as I used the Encyclopedia Britannica in my youth and continue to use reference works today. (In fact, I think highly of professors who can provide a good reference work or textbook for their domain, and poorly of those who can't or refuse to because they feel their topic cannot be "reduced.")

Even in high school, when confronted with a rule prohibiting the citation of a reference work I felt as if I was being encouraged towards plagiarism, or at least unfairness. If a reference work points me to a more authoritative source, should I at least not acknowledge this bit of help? Particularly, if I'm more likely to be influenced by the summary provided by the reference? Additionally why would any book among the thousands published a year be any more authoritative than a general reference work on the sole basis of its form? I could compile a multipage bibliography of books denying the Holocaust, but find few -- if any -- general-purpose reference works that did the same. The generality of the reference work insulates it from partisan pressures because it must appeal to a wide audience over many topics. It is unlikely that neo-Nazis would publish a useful general reference work for the sole purpose of shifting articles on Jews towards their perspective. However, this is not to say that reference works have no bias. Only, that if we look at the formal genre of a text only -- which is what this rule does -- any given reference work is less likely to be "eccentric" than any book taken at random.

Finally, when one considers in what direction the authority flows, books are often demonstrated as authorities by being cited by the encyclopedia! Or, at least, encyclopedias imparts as much authority to the books they cite, as they obtain in citing them -- this transfer is mediated by the reputation of the publisher, editor, and contributors.

this entry posted to culture/wikipedia;
comments (0)

2005 Feb 14 | XML ElementTree Data Model

I've been playing with Fredrik Lundh's ElementTree as an intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it is presently unsupported; ElementTree is fast and has XPath support.)

ElementTree Conventions

this entry posted to technology/python;
comments (1)

2005 Feb 11 | Abstractions: the benefits and dangers

Why is abstraction such a powerful technique in computer science, but a potentially befuddling one in the social sciences? In computer science, abstractions permit one to:

  1. increase the scope of the construct being abstracted
  2. hide unnecessary details
  3. cleanly define object/functions and their interfaces

Yet, in social science it:

  1. increase the scope of the construct being abstracted
  2. hides details — but how do we know whether those details were important?
  3. often has muddled definitions with few cleanly define interfaces

The potential danger is that abstraction in the social sciences — often spoken of as "theory" though I prefer to think of theory in this context as a metaphysic or framework — is that it increases the scope of action while possibly becoming muddled and detached from reality. With greater scope, comes a greater responsibility to clarity.

Consider, that one rarely hears of a computer abstraction being overly broad because it is necessarily grounded in the reflexive practice of usage. (If it's broken, it'll get fixed.) For example, the Python sort function takes an object to be sorted and returns a similar, sorted, object. We need not be overly concerned with the algorithm or its implementation. Furthermore, as new data structures are developed, one can pass it a specific compare function relative to that data structure. We get to re-use the concept of "sort" with enough specificity as needed — but no more.

Social science "theorists" tend to resist attempts to cleanly define words, to explicitly model a construct, or be "reduced" by attempts at clarity and coherence. When I asked a student about the meaning of a word he used because I could not find it in the dictionary he replied, "that's your mistake, don't look these words up." Oddly enough, Bourdieu's most useful work is in an extended interview — as is McLuhan's Playboy interview — though they both fear being overly reduced by this form of discourse. Consider how this differs from the following well-known anecdote:

Richard Feynman, the late Nobel Laureate in physics, was once asked by a Caltech faculty member to explain why spin one-half particles obey Fermi Dirac statistics. Rising to the challenge, he said, "I'll prepare a freshman lecture on it." But a few days later he told the faculty member, "You know, I couldn't do it. I couldn't reduce it to the freshman level. That means we really don't understand it."

If only we were so humble in the social sciences — and Feynman was not known as a humble man; perhaps he figured that if he couldn't do it, then no one could.

this entry posted to culture;
comments (0)

2005 Feb 11 | A new model of design

First there was designers design. Then there was user-centered design. But even that, is now behind the times. Given open development paradigms and "extreme programming" practices, there is another approach to design, which is no "design." Instead one has frequent, incremental, releases that the user environment can immediately send feedback upon. This takes place in a context, not of the firm, but of a larger ecology where each design is subject to evolutionary pressures.

For example, at the W3C we would sometimes despair that a project might take years, because of all the folks involved, and not satisfy requirements in the end anyway. In fact, sometimes the requirements would be better satisfied by lightweight/hacker constituencies external to the W3C. Consequently, upon my noting this, Tim Berners-Lee suggested that we develop projects by a "red team/blue team" model. Instead of having one group of 40 engineers working for a couple of years on a design that may not be well-liked at the end, one creates two smaller teams to specify and prototype proposals within a matter of months. And then someone, either a meritocratic figure or the community at large, can identify which one is superior. The problem is, within the firm, it is very difficult to do this because of political reasons: no one wants to be on a losing team. (In the case of a large project, everyone is on the same mediocre team at least.) But in the context of open development, there are often many competing projects. Some continue on in seeming redundancy, and others die quickly -- and I expect the developers are even happy about that so they came move on to something more fruitful.

this entry posted to culture;
comments (0)

2005 Jan 06 | Epistomological Authority

Two recent discussions have prompted me to return to question of epistemological authority. In the case of the online collaborative Wikipedia, Larry Sanger, a founding participant, lamented the inability of the community to accept and retain contributions from "experts." Also, creationists have re-factored their doctrine into a pseudoscientific "theory" of intelligent design and advocate that it be taught alongside, or instead, of evolution. I believe both of these cases share the conditions that there is such a thing as expertise, but that all views are potentially ideologically biased. (And intelligent design on the Wikipedia is an example as well.) Can the community at large distinguish authoritative arguments, or must we be cynical and believe that all arguments are biased but some are only more eruditely presented? (In fact, I've realized that the bulk of continental social "theory" is about identifying such biases: Boudieu's doxa and symbolic violence, Hall's naturalization, Gramsci's hegemony, Marcuse's and Adorno's technological veil, Weber's symbolic violence, Foucault's episteme, Barthes' exnomination ('unnaming') etc.).

In An Introduction to Reflexive Sociology Pierre Bourdieu (1992) discusses a couple of his conceptual contributions which may be of use in understanding these debates. A field is a cultural domain in which participants have a stake in and compete with each other for the accumulation of some sort of capital (i.e., social capital).

Like any social universe, the academic world is the site of a struggle over the truth of the academic world and of the social world in general. Very rapidly, we may say that the social world is the site of continual struggles to define what the social world is; but the academic world is a peculiarity today that its verdicts and pronouncements are among the most powerful socially. In academia, people fight constantly over the question of who, in this universe, is socially mandated, authorized, to tell the truth of the social world (1992:70).

One of Bourdieu's preferences is that fields be true to themselves and operate autonomously and in a "scientific" manner.

A scientific field is a universe in which researchers are autonomous and where, to confront one another, they have to drop all nonscientific weapons -- beginning with the weapons of academic authority. In a genuine scientific field, one can freely enter free discussions and violently oppose any contradictor with the arms of science because your position does not depend on him or because you can get another position elsewhere. (1992:177)

My own understanding is that scientific does not equal academic: academic authority is based on a hierarchical application of judgment to those who allegedly know less; while closely associated with the academic, scientific assessments should be discernible to those who know the same or even less. Above, Bourdieu introduces the notion of "scientific arms": legitimate means of dispute. In Jonathan Sarfati's response to the creationist book Teaching About Evolution, he notes that the creationists claim that the National Academy of Sciences "resorts to arbitrary, self-serving 'rules' to determine what qualifies as 'science' and what doesn't." Of course, and presently in America we have the confounding situation that a great majority of the members of the National Academy of Sciences accept evolution, but a frightening proportion of Americans don't.

A field is all the more scientific the more it is capable of channeling, of converting unavowable motives into scientifically proper behavior. In a loosely structured field characterized by a low level of autonomy, illegitmate motives produce illegitimate strategies and, furthermore, strategies that are scientifically worthless. In an autonomous field such as the mathematical field today, by contrast, a top mathematician who once to triumph over his opponents is compelled by the force of the field to produce mathematics to do so, on pain of excluding himself from the field. Being aware of this, we must work to constitute a Scientific City in which the most unavowable intentions have to sublimate themselves into scientific expression. This vision is not utopian at all, and I could propose a number of very concrete measures designed to make it come true. For instance, where we have won a national referee or evaluator, we can institute an international panel of three foreign judges (of course, we must then control for the effects of international networks of mutual knowledge and alliances). When a research center or a journal enjoys a situation of monopoly, worked to create a rival one. We can raise the level of scientific censorship by a series of actions designed to upgrade the level of training, the minimal amount of specific competency required to enter the field, etc.
In short, they must create conditions such that the worst, the meanest, and the most mediocre participant is compelled to behave in accordance with the norms of scientificity in currency at the time (1992:177).

Interestingly, Bourdieu is advocating censoring "nonscientific" claims. Which, while not very democratic, can be meritocratic — though I think his proposals for committees implausible. Yet, while Bourdieu is sympathetic to the autonomous operation of a field he does not want to focus on a particular methodology or bureaucracy, but the almost anarchistic competition under an already agreed to metaphysical system.

There is in history what we may call, after Elias, a process of scientific civilization, whose historical conditions are given with the constitution of relatively autonomous fields within which all moves are not allowed, in which there are immanent regularities, implicit principles and explicit rules of inclusion and exclusion, and admission rights which are being continually raised. Scientific reason realizes itself when it becomes inscribed not in the ethical norms of a practical reason or in the technical rules of scientific methodology, but in the apparently anarchical social mechanisms of competition between strategies armed with instruments of action and of thought capable of regulating their own uses, and in the durable dispositions that the functioning of this field produces and presupposes. (1992:180)

But, in the case of the creation/evolution debates, what is at stake is the metaphysical system of judging what is and is not science; in Wikipedia, what is and is not good, neutral, and authoritative content? Creationists object to natural science as a baised metaphysical system, or even a religion, like their own supernatural literalism. It is at this point, that I find their position simply incoherent and can no longer sympathetically engage in the debate. The divine, supernatural, and the ineffable may exist and be revealed to some, but these are not legitimate discourses in a public sphere in which others do not have access to the inspired source. The alternative that I can understand is as Robert Pennock (2001:84) wrote in Intelligent Design Creationism and Its Critics, "The methodological naturalist does not make a commitment directly to a picture of what exists in the world, but rather to a set of methods as a reliable way to find out about the world — typically the methods of the natural sciences, and perhaps extensions that are continuous with them — and indirectly to what those methods discover."

As I discussed in Scandal and The Politics of Science and Vice Versa, "We can never know everything. We haven't the capacity nor time to give informed consideration to every important issue. So we rely upon labels and personalities to set the default values of our opinion." A claim of authority is a claim of being worthy of being deferred to. In the case of Wikipedia, if people are to accept it as an Encyclopedia, it seemingly must prove itself as an authority being worthy of being deferred to. Such proxies are often determined by the judgments of peers, judgement of superiors, method, majorities, personal experience, and results. And the difficulty with both the Wikipedia and debate on evolution is that the best method, results, is not immediately apparent. If we stop teaching evolution now, the effects would be long-term and confounded with many other social variables. And how does one "objectively" judge the quality of Wikipedia?

Two of the key differences between Wikipedia and open source software development are that with questions of protocol and code one can easily make authoritative claims based on the results, and consequently such communities tend to be meritocratic. As I wrote in Why the Internet is Good, "With the cacophony of ideas, proposals, and debates, and a lack of a central authority to cleave the good from the bad, how does one sort it all out? It sorts itself out. ... The success of any policy is based simply on its adoption by the community." Encyclopedia making is not so fortunate, and Wikipedia strives to be more open, accepting anonymous contributions even, than most all open source projects. Nor can we simply rely upon the naked authority of expertise and academia: expertise should be supported, but to be accepted the results of expertise must also be widely perceptible to the larger public.

this entry posted to culture;
comments (0)

Open Communities, Media, Source, and Standards XML

by Joseph Reagle

powered by pyblosxom


reagle.org

What I'm reading online (blogroll)


Categories

Archives