2005 Dec 15 | Nature's Wikipedia and Encyclopedia Britannica Analysis
Those interested in Wikipedia are discussing the comparison of errors
appearing in a sample of articles, reported in by Nature, of 42 article. While I agree
with Jakob Voss's comments on the
limitations of the study, for this
sample the amount of errors does seem roughly comparable with Wikipedia
-- hopefully that Wikipedia outlier for Dmitri Mendeleyev will be
fixed soon. I was further intrigued to note that the errors per topic
correlate between the two:

This is a strong correlation (r=0.574) implying perhaps a similarity in
the difficulty of writing on that topic, or perhaps a difference in scrutiny
by the experts (e.g., the person reviewing the Cambrian explosion is
picky!).
this entry posted to
culture/wikipedia;
comments (1)
2005 Dec 13 | Wikipedia History Scraping
To confirm the power law in Wikipedia edits (many doing a little, a few
doing much) this regular expression and Python code parses a Wikipedia
history fairly well:
history_regex = r""".*?oldid=(\d+).*(\d\d:\d\d.*?\d\d\d\d)</a>.*<span class='history-user'>.*?>(.*?)</a>.*(?:<span class='comment'>(.*?)</span>)?</li>"""
regex_obj = re.compile(history_regex)
url = sys.argv[1]
html = getHTML(url)
lines = html.split('\n')
for line in lines:
if line.startswith("<li>(<a"):
counter = counter+1
match_obj = regex_obj.search(line)
if match_obj:
oldid,date,author,comment = match_obj.groups()
edits.setdefault(author,[]).append((oldid,date,author,comment))
counts = [(author,len(edits[author])) for author in edits.keys()]
counts_s = sorted(counts, reverse=True, key=operator.itemgetter(1))
print counter
for author,number in counts_s:
print author, ";", number
this entry posted to
method;
comments (2)
2005 Nov 14 | Godwin's Law
On October 4, 2005, I had the good fortune to meet Mike Godwin at the ITS Colloquium.
Godwin is famous for his adage that as the duration of a USENET discussion
grows, so does the probability of a comparison with Hitler or Nazis. During
the seminar the topic of Wikipedia arose and Mike, sitting two seats away,
nudged me and said he had an interesting story to tell.
Godwin's Law is
now quite old, few use the USENET for discussion, but the observation remains
potent because while Godwin spoke to a feature of human discourse, that
though exaggerated on discussion groups, transcends a particular media.
Indeed, Senator Rick Santorum started a controversy with just such a
comparison this summer.
Godwin notes that his observation was penned specifically as a memetic
experiment: to pose an idea and see how it perpetuates and mutates in the
field of popular discourse. The law has been fecund, leading to variants and
malapropisms. Ironically, when someone unknowingly uses one of these variants
she might be attacked by a dogmatic defender of the orthodoxy, provoking
allusions to fascist language Nazis, thus proving the adage.
In any case, the Wikipedia experience that Godwin wished to share was
about the article on Godwin's Law. While modifying the article to more
accurately reflect the history of the meme, some other editors objected. The
trinity of Wikipedia policies is that editors should be neutral in their
presentation of claims, not include original -- and potentially crackpot --
research, and provide citations such that any such claim can be verified by
others. So, this story brings us to the interesting question of how does the
primary source, such as Godwin, edit a related article? While recognizing
Godwin's authority, one might also then challenge his neutrality and
reporting of primary claims. It is not uncommon for contributors to create
"vanity" edits (pages or links) that are rebuffed with these policies when
the edit is not of encyclopedic merit. But what of when the edit is of merit?
Are the most qualified primary sources disqualified from editing the
Wikipedia article? Need a primary source published her first person claim
elsewhere before it can bear upon the Wikipedia article?
this entry posted to
culture/wikipedia;
comments (1)
2005 Nov 02 | Can you trust the Wikipedia?
In the past week the perennial question of "Can you
trust the Wikipedia?" arose while I was working on the tedious -- though
oddly compelling for an obsessive like myself -- task of reviewing the early
period of Wikipedia history. I slowly worked through the Wikipedia
timeline ensuring each event was dated and sourced. I realize that if I'm
ever to trust this timeline, I need more than a bald claim. And, my
appreciation is so much greater when I can peruse the primary source. For
some sources, such as the Nupedia list archives, I was able to find copies of
messages on the Internet Archive. Another source, Jimbo's explanation about
Stallman's proposal for a competing project, is seemingly lost forever.
Fortunately, Stallman was kind enough to tell me of his recollection of the
incident and allow me to publish it. Most frustratingly, I encountered a
tantalizing mention of Internet encyclopedia proposals from the UN's
Millennium Project but failed to find any source or corroboration; that
information is stricken from the article. Which brings me back to the
question of trusting the Wikipedia. I have addressed the broader question
of epistemological authority before, but now I want to focus on the role
of sources.
Simply, Wikipedia is only as trustworthy as its links. Actual scholarly
authority is similar. A critical part of scholarly training is learning why
and how to cite (link to) others. Expert authority is also generated from
experience in the field, and theoretical and methodological training. Yet, as
I've noted many times "'We
can never know everything.' We all can't be experts on everything, so we
often need to rely upon credible authority while remaining critical and
skeptical, but never dismissive." Consequently, the tokens "Ph.D." and
"professor" become proxies for an assessment of trust that very few people
are able to substantively test, but, to which many are willing to defer.
Because Wikipedia lacks such reputation mechanisms Wikipedia is, again, only
as trustworthy as its links. For educational purposes, the implication of
this is profound. Should we teach students to trust a claim because it was
simply uttered by a credentialed person? Or, should we encourage them to
click a link and teach them how to investigate for themselves?
The consequent of this for Wikipedia culture is that it doesn't link
enough. Perhaps my experience with Wikipedia history is exceptional since
Wikipedians take the sources for granted. But, as I found, that's a poor
historical assumption. I also share the concern that articles might become
overly busy or dense with citations. There is a tension here, but one I think
the technology can handle. It's why I believe the trustworthiness of
Wikipedia is in part dependent upon the citation
project and furthering a culture of "if you claim, you cite" as implied
by the Verifiability
policy.
this entry posted to
culture/wikipedia;
comments (2)
2005 Oct 27 | Zimmerman's theory of history
My understanding is that Zimmerman thinks that good history makes a
compelling argument about humans in time. It might be compelling in that it
is a story told well, and, most importantly, it casts light upon bigger
historical themes. There is a relationship between the specificity of the
project and the generality of its historical context. For example, to
describe the Wikipedia's coverage of 9/11 is just that: descriptive. But a
historical argument should also say something more. (Otherwise, the
description might only be of interest to the narrowest archivist.) What does
the coverage of 9/11 tell us about the event, about new media, or even the
development of the Wikipedia? The specific historical research and argument
needs to relate to the general -- and often taken for granted -- themes: to
augment, support, or counter. However, one must also guard against
presentism, to draw connections between the past and present. This often will
be compelling, and sometimes useful, but it can also turn into a fishing
expedition in order to justify how the author feels about the present,
instead of a deepening of our understanding of the past.
[Later] In my own draft paper, I did not address the question of what is
different about the Wikipedia vision, and that of, say, H. G. Wells.Or, what
does HG Wells tells about the Wikipedia? What is new and novel with the Wiki,
why did Wikis happen when they did? What are the new assumptions brought by
use of the Wiki? None of the projects I looked at were used to describe the
cultural space from which they came. It can be useful to pose this question
to oneself: what does the Wikipedia tell a future historian about our present
time?
this entry posted to
method;
comments (0)
2005 Oct 17 | Ethnography and History
What is the difference between (sociological) ethnography and history? In
taking a methodological course in each of these disciplines this semester
I've been attempting to find an answer to the question; I offer my current,
imperfect, understanding.
Simply, the ethnographer is present to the social phenomenon of interest
whereas the historian has some remove in time and place. Each then has a
different predominant focus on the question of subjectivity. Ethnographers
tend to think about their own position and biases relative to their
environment, and historians are concerned about their relationship to their
sources. However, a reflective practitioner of each method appreciates the
subjectivity of herself and the object of study. Whether it is a discussion
in the present (predominantly ethnography), a recollection of the past (oral
history and ethnography), or records of the past (predominantly history),
each is shaped by the social environs.
Another possible difference is that while history is often content with
the particular, sociology reaches for a transcendent theory. This is not to
say sociology has no concern with "thick description," nor that history has
no thesis -- it is an argument about humans in time -- but that their primary
aspiration and style differ. Whereas sociological theory creates, or is the
result of, a distance by the researcher, time often does the same for the
historian permitting a triangulation (of many sources) whereas thte
ethnographer often looks for contemporary comparison. (Comparison with the
past is often called the "ethnographic revisit.")
In addition, one might then ask how journalism and anthropology fit into
this mix!
this entry posted to
method;
comments (5)
2005 Oct 08 | Plagiarism and primary sources
Week eight: what accounts for the recent spate of scandal surrounding
"facts, fictions, and fraud" in American historical scholarship? How might
knowledge of these episodes affect or alter the way you pursue your own
scholarship?
In "Past Imperfect" Hoffer
(2004) offers a number of arguments as to why there have been so many
incidents of alleged fraud in the historical profession: the aspirations of
authors to write to the popular market, the almost industrial system --
employing many assistants -- with which books are researched and authored,
the demands by publishers for more books, the popular audiences' demand for
entertainment rather than scholarship, the eagerness of the ideological
opponents to take these authors down a notch, and an inability for the
profession to police itself.
While authors such as Michael Belleselis, who falsified data in order to
argue that gun ownership was not common in early American history, deserve
rebuke and sanction, I felt sympathetic to some of the authors (i.e. Stephen
Ambrose) who got into trouble for borrowing primary source quotations and
tweaking the surrounding secondary material and presenting it as his own with
a citation to the secondary source. Was the problem:
- Ambrose was not specific enough in a citation? Hoffer implies that by
using a range of pages in a citation such plagiarizing authors make it
more difficult for readers to identify the secondary source text has been
lifted and slightly altered.
- Ambrose did not alter the text sufficiently so as to no longer be
considered plagiarism? What then is the right threshold? I ask this
myself sometimes when I find I can easily make an improvement by
reworking a sentence from a secondary author but some sequences of words
from the secondary source author are sufficiently generic and appropriate
such that they can't be written in a different way without being
awkward.
- Ambrose appeared to have access to the primary source, when in fact he
did not. This is the complaint I most sympathetic to because one of the
safeguards built into scholarship is we do not necessarily take
everything others have said for granted. However, this issue, too, has
presented some difficulties to me. While citation guides in the various
publishing styles out there are always quite explicit in how to cite
primary sources, they often lack information on citing primary sources
found in secondary sources. With the availability of sources on Google
Print and Amazon, one of my practices is to find the primary source on
the Web -- often another secondary author. So for example, if I find an
interesting excerpt cited in a secondary source, I can then investigate
the primary source, what was actually written, and whether I want to read
that book. However, in some cases I don't feel the need to read the book
but have read the appropriate chapter, section, or series of pages in
which the excerpt occurs. Can I then use that excerpt as a primary
source? I sometimes do. However, by making a primary source citation and
including it in my bibliography, do I give the appearance of having read
that whole work? (Indeed, in many of my graduate courses the professors
supply two or three chapters out of the book for the students to read. I
frequently cite such materials.) In fact, can the readers assume that an
author has thoroughly read every work appearing in a bibliography? (One
way address this problem is to note and include in the bibliography
specifically which chapters or pages one has read.) I think not.
However, regardless of how one might answer any of these questions, my
concern is with the veil that shrouds practical discussion about these
issues. For example, I feel quite vulnerable in discussing my usage of Google
and Amazon above. I feel as if in raising these questions I'm making myself
suspect. It is as if everyone assumes we know exactly what is right, when in
fact there are many gray areas. (And the topic of plagiarism is not the only
one shrouded in this mist, there are gray areas around how we make use of
copies of copyrighted works in the academy, as well as how to finesse "human
subjects" review board bureaucracy.) Typically, these issues, when abuse is
finally caught, are addressed by overly strident, sometimes impractical,
norms and bureaucratic procedures. Instead, they should be addressed at the
out-set as challenges that all researchers must grapple with, and with which
we can feel free to share our strategies and concerns.
this entry posted to
method;
comments (0)
2005 Sep 30 | Results as of Fall 2005
I'm now in my third year at NYU; the first and second year exams are done,
and after this semester I will have satisfied my course requirements. (This
semester I'm taking methodological courses including ethnography, history,
and statistics.) The outstanding item, then, will be the completion and
approval of my proposal -- which will also include finding a third member for
my committee.
The majority of my efforts are focused on the Wikipedia; some recent
drafts that may be of interest on that note include:
- Arguments Among Friends:
the Wikipedia - a snapshot of the sharp point (sans literature
review) of my proposal:
The Wikipedia is not merely an online encyclopedia; while the Web
site is useful, popular, and permits anyone to contribute, the site is
only the most visible artifact of an active community. Unlike previous
reference works which stand on library shelves distanced from the
institutions, people, and discussions from which they arose, the
Wikipedia is a community, and the encyclopedia is a snapshot of its
continuing conversation.
- Four Short Stories about the
Reference Work - an encounter with four themes in the history of
reference work production that I think are also relevant to the
Wikipedia; I hope to complement this, this semester, by situating the
Wikipedia in the other realm of online knowledge production:
Many histories can be written of the reference work. There is the
chronicle of technical and institutional forces intertwined in the
production of the book: of conquest, co-option, trade wars, empire and
religion. Also, there's the drama of clashing conservative and
progressive impulses: the expectation for the humble reference work to
fixate the social order, or to shatter it and form a new realization of
social possibility. There are tales of great and eccentric
personalities: the perseverance of men who dedicate their lives to the
tasks of organizing everything known about the universe. Finally, there
is the story of collaboration: of people standing on the shoulders of
giants and of plagiarism.
Of course, these do not exhaust the potential perspectives with
which to view the development of the reference work but these are the
ones presented in this essay. My goal is to consider the history of
reference works, specifically the dictionary and encyclopedia, from
these perspectives in order to contextualize a more focused history and
ethnography of the Wikipedia, an on-line collaborative encyclopedia; I
hope to encounter salient issues of the past that might be relevant to
the present day.
- Is the Wikipedia Neutral?
- an (early draft) extension of A Case of Mutual Aid:
Wikipedia, Politeness, and Perspective Taking to tease apart what is
meant by something being neutral, and is it the right term to describe
Wikipedia efforts:
Claims of neutrality and accusations of bias are common themes of
contemporary discourse about the media, government, education, and
technology. In this essay I extend earlier work on the collaborative
culture of Wikipedia (an on-line and free encyclopedia) to specifically
focus on the fundamental but often misunderstood notion of
neutrality.... This essay is inspired by earlier debates on neutrality
of technical standards, literature on bias in technical systems, my
present fascination with this Wikipedia norm and a change in my belief
that while an important concept, the label of neutrality was an
unfortunate coinage in the Wikipedia context.
this entry posted to
career/phd;
comments (2)
2005 Sep 28 | Writing Ethnography
The Professional-Lurker mentions
Tales of the field: on writing ethnography (Van
Maanen 1998), which I highly recommend for "practicum" reading. I'm
presently reading another good book in the same series Writing
ethnographic fieldnotes(Emerson,
Fretz and Shaw 1995).
this entry posted to
method;
comments (1)
2005 Sep 12 | Cranking out the words
In addition to the books I read when I
first arrived at graduate school I've encountered a few other highly useful
guides that encourage the words to hit the page:
- Tara Gray's 12 Steps to
becoming a more prolific scholar
- Michael Froomkin's legal
writing tips
- Blog: Academic
coach
this entry posted to
career;
comments (0)
2005 Sep 09 | Coleman, History v. Ethnography
Yesterday I read a chapter from Gabriella Coleman's dissertation on the Debian community. (She co-authored a piece with Mako in the M/C Journal issue in which I wrote about open content communities.) I was quite excited to read it because it reminded me of another paper
had just read a couple days ago by Tom Chance -- sharing themes of
hacker culture -- but more importantly because it includes many the
same questions I have about my own community, the Wikipedia. My
inspiration and template of sorts has been Michael Sheeran's study of the Quaker community.
However, that dissertation is nearly 30 years old and has no
theoretical or methodological text. (And while I hope to not linger on
those things in my own dissertation, I have to have a sense of them in
order to write it!) In particular, both Sheeran and Coleman conduct a
combination of history and ethnography.
In my case, I've been wondering if my project is:
- an ethnography, but of notable and public persons and events -- so purposefully anonymizing is inappropriate.
- a history, but of a community and work that is ~5 years old.
- an oral history, but the majority of the material is not be oral but textual (e.g., email, Web pages, IRC, etc.).
It appears the Coleman did a mix of both. When referring to the "public
history," she names names where appropriate; but, developer interviews
are anonymous. However, there are some oddnesses which I find
confusing. For example when she speaks of the "Vancouver incident" she
quotes an e-mail, without citing its source. It's a well known email,
and not citing it directly strikes me as odd. I'd love to see
her methodology secion and IRB proposal. In fact, I hope to chat with
her soon. (Additionally, it appears she's studied religion and I wonder
if that would be relevant to my own interests in sectarian
decision making.)
While reading the dissertation, I made this little table comparing Debian and Wikipedia:
|
Debian (Coleman 2005) |
Wikipedia (Reagle 2005) |
| charter |
social charter |
"an Encyclopedia" |
| policy |
Constitution |
NPOV? |
| final arbiter |
technical committee |
arbitration committee/Jimbo |
| leadership |
Democratic/meritocratic |
Jimbo and meritocratic lieutenants |
| socialization |
new maintainer sponsor mentorship |
radically open |
| "witnessing" |
one's biography and explanation of the policies in one's own words |
user pages (or people describe how they came to and what Wikipedia means) |
| decision making |
contributed, discursive, and voting |
discursive, persistent, and occasional voting (deletion) |
this entry posted to
method;
comments (1)
2005 Sep 09 | Online resources in BibTex
In BibTex, how does one represent the paragraph number for a citation
in online -- non-paginated -- articles? Some journals request that
citations indicate the paragraph number of the online resource. I do not know
how to represent that in BibTex: I know of no tags for paragraph numbers in
the BibTex format, nor do I think most LaTex tools/styles know how to
represent them in a citation...?
this entry posted to
method;
comments (0)
2005 Jun 27 | Mailman, Message-id, and Persisten URIs
As someone who is interested in studying and citing email conversations,
the opacity of the mailman interface -- or the lack of my understanding -- is
a pain. I was spoiled by the W3C's system where each email
had a header with a URL to its place in the archive, which corresponded in
some way to the msg-id! When processing comments on a spec, or citing
conversations, it's very handy to be able to link to a persistent Web
representation of an email.
In writing about Wikipedia discourse I'm stuck with using the message-id
if I happen to have that email in a mbox, or a URL if I happen to have a Web
page, but from one I can not easily get the other, and I'm not confident that
the URL will be stable in any case. (For example, will this
always correspond to the message with the message-id
"42BEC0EF.6070906@web.de"?)
Without a guarantee of stability, I suppose its best to use msg-id in
citing WP discourse, but that makes finding that message problematic for the
reader. I'd provide a hint if I could somehow obtain it myself, but the HTML
page for a message in the archive has no indicatation of the msg-id. And even
if I have the msg-id, I can't easily find the corresponding archive URL.
Before sending this message, I thought there would be a search interface and
I could write a script, but there doesn't appear to be one, and it doesn't
work in Google (e.g., this
query returns nothing).
What to do?? Fortunately, http://marc.theaimsgroup.com/
provides a Web archive to these lists with the ability to query based on
message-id. The following procmail script will add a header and signature
containing a URL of the message:
###########################################################################
# insert X-Archived-At header into messages from Wikimedia
lists
# :0fwh will append the "Archived at" at the beginning of the
message
:0
* ^List-Id: .*Wiki[mp]edia.org
{
MID=`formail -xMessage-Id | tr -d '<' | tr -d '>' | tr -d '
'`
URL="http://marc.theaimsgroup.com/?i=${MID}"
:0fw
| (formail -I"X-Archived-At: ${URL}"; echo "Archived at:
${URL}")
}
(Thanks to Hank Leininger for responding to my concern that marc URLs
contained the excluded
character delimiters '<' and '>' meaning they were not a valid URL
and weren't automatically clickable in some applications such as KMail. The
next day it was fixed and I am happy!)
this entry posted to
method;
comments (0)
I am releasing a new zipfile of
the fe mindmapping bibliographic tools. As
explained in Extracting
Bibliographies from Freemind, these are python scripts that are able to
convert between Freemind
mindmaps (using a few simple conventions) and bibliographic formats (i.e.,
OO.org CSV and bibtex). This approach is preferable to other bibliographic
tools with limited/constrained forms for text entry. With
fe one has a complete outline/map of texts, with
figures, images, tables, links to sites, etc.; one can easily organize texts
by topic or in separate mindmap files; and one can generate queries where
each matching line has its appropriate citation with year and page number
(e.g., "Giddens").
Unlike many bibliographic tools, it does not query on-line databases, but one
can use such tools (e.g., tellico or refworks) to query and generate bibtex
bibliographies and then use be.py to convert them to a mindmap.
- fe.py: extract bibliographic data from
bibliographic MM (dependent on XML ElementTree and
optionally bibtex2html)
- this version is faster since it uses XML ElementTree
instead of XML
Tramp.
- given a list of authors cited (*.rl, such as that generated by
pe.py or pyblink) bibtex2html will
generate a bibliography of only those authors.
- bibliographic maps are searchable from the command-line or via the
Web (e.g., search
results for "Giddens" in my mindmap [java|flash]).
- a Web of mindmaps can be searched for essential entries
(the title is bold) and placed in a new mindmap for studying.
fe.py -h (help)
-v (output csv)
-c (chase links between MMs)
-w (output bibtex & html file) -a (include abstracts)
-s (use bibtex style)
-q (query)
-e (create new MM of essential works)
- be.py: extract a MM from a bibtex file (dependent on bibstuff)
- de.py: extract a MM from a dictated text file
- ff.py: fix the case of titles of a bibliographic MM
- pe.py: extract the bibliographic keys of the form 'Snide and Smith
(2003)' or '(Snide, Smith and Smittie 2004)' from natural language
text
- te.py: parse inconsistently formatted textual bibliographies into
bibliographic MM (e.g., from syllabi, cb2Bib is cool too)
this entry posted to
technology/python;
comments (0)
2005 Jun 10 | Wikipedia and Astroturfing
Clay Shirky notes
a cycle of references created by a few people with the effect of promotion: a
Wikipedia article for Symphony OS is referenced in a Slashdot article, which
is then noted in the Wikipedia article:
This is an interesting kind of spam, or maybe we could call it a
reputation hack.... They create a Wikipedia page, point to it as if to
demonstrate independent interest for the project in their potential
slashdot post, then point to the slashdot effect on the Wikipedia page as
proof of said independent interest. Voila, an instant trend.
The Symphony Talk
page reminds me very much of one of the Lamest
Edit Wars Ever over very similar issues with SkyOS: "Fast & furious
kindergarten catfight with accusations of GPL violations, advertising, lying
and fanboyism." One difference is that Symphony is actually Free Software, so
while there is an argument about advertising, implied dishonesty and
fanboyism, the GPL hasn't been an issue -- yet! (Accusaion of GPL violation
sometimes strikes me as similar in some sense to Godwin's Law; while
license violations may be a substantive accusation, the discourse has no
doubt gotten heated by then.)
Shirky also thinks that referencing the consequent Slashdot Effect on the
Symphony OS site doesn't merit inclusion. (Personally, I don't mind and I
don't read the Slashdot Effect as a reciprocative authority.) After Shirky
removed it, EliasAlucard reverts the removal commenting "Why is trivia being
removed by that anon user 'Clay Shirky?' As far as I'm concerned, he has
nothing but distaste for this article, and his edits shouldn't be reckoned
with." Unfortunately, here and on the Symphony Talk page
EliasAlucard is not representing himself -- nor the article -- well and is
failing to uphold numerous Wikipedia norms of good faith and writing for the
enemy. (In Wikipedia, we encourage folks to try to see the perspective of the other,
not write them off.) Also, the Slashdot effect claim is without attribution
and citation of evidence.
So I've included that link at least.
this entry posted to
culture/wikipedia;
comments (0)
2005 May 30 | Pruning
The Professional Lurker notes
an article
by the FreeRange Librarian which identifies the important role of
deleting/removing material of dubious quality. This function, too, exists in
the Wikipedia: Votes for
Deletion. Otherwise, this argument simply reduces to the one of authority
which the Librarian has raised in the past. Authoring, editing, deleting, and
moving -- and soon article
validation by user feedback -- all exist in the Wikipedia. Some, such as
the Librarian and Sanger simply
want to highlight or make use of experts' abilities. This is difficult when
all users -- including the jerky clueless -- have the same standing and
victory is more often achieved by dogged verbosity. So, there is inevitably
frustration on the part of some. Such as it is. The Wikipedia is a working
experiment on this note. (And if you want to look into the petri dish, see
the discussion of Wales'
credentials idea.)
this entry posted to
culture/wikipedia;
comments (0)
2005 May 16 | NYU's Information Services Suck
An example of how frustrating reliance on bad technology can be consider
the following. NYU publishes a course catalog for the subsequent semester.
However, NYU often does not publish enough such that I get a copy. The
standard rejoinder is to use the web site. While I would object in principle,
I think NYU should have a paper copy available for me, I certainly object on
the basis of utility: NYU's information technology resources are extremely
difficult to use. Trying to find and register for courses is a nightmare.
To get a listing of courses, and register, one has to select various
options in a JavaScript pop-up menu, which is very sensitive to losing focus.
So for example I want to figure out what courses are available. Under the
"registration" option there are among other options: registration status,
course status, register, registration schedule. The appropriate selection is
"course status," though it is always the last one I think of. Having selected
that, I'm presented with a number of forms fields whereby I can "search" for
a course. A course listing typically looks like:
E38.2008 (12345) - Sem Media Criticism II SEM 4.0 .
- E38 = Course subject2008 = Course Number
- 12345 = Call Number
- Sem Media Criticism II = Course Title
- SEM = Course Type
- 4.0 = Course Credits
In order to perform a search, I must know the course subject (of which
there are 300 entries in a pulldown menu!) and the course number (2008) or
course level (graduate). The course subjects are listed alphabetically, so if
I only know the token "E38" -- which is how we refer to our Media Ecology
classes -- I have to scroll through the menu of 300 entries or, as I
"prefer," open a HML source view of the document and search for the "subject"
corresponding to E38. The sad irony here, is that many of the 300 entries are
actually duplicated and only differ with respect to whether they are graduate
and undergraduate subjects. If this is the case, then why would I need to
specify the "course level"? In any case, what is unfortunate is that:
1. It is not possible to search for courses based on words that appear in
their title or description. So, for example there's no chance of me being
able to search for courses with "social" or "technology" appearing in their
description or title.
2. Should I actually find of course, and want to register for it, I cannot
click on it to do so. I have to write down the call number, go back up into
the confusing JavaScript pulldown menus, and fill in the registration
form.
Now, the really annoying thing about all of this, is that you're not
allowed to use the back button. If you do so, at some point you will likely
encounter the error: "Your access to the system is denied because of improper
authorization. Return to the login page to re-enter the required user
authentication data." This means I lose my browsing context and have to go to
the login page re-login! This happens on Windows and Linux, in the Mozilla,
Konqueror, and Internet Explorer browsers. ("Not using the back button" seems
a wholly inappropriate recommendation.) And in general, the site violates
numerous accessibility and usability guidelines that I won't even bother to
detail here.
If one was lucky enough to get a paper course listing, one should just
punch those course call numbers into the registration form. Otherwise, don't
expect to find any courses for which you don't already know the details.
Information technology, in this case, is no improvement upon, or even
replacement for, the printed catalog.
On the good news front, I may finally be able to use SSH. By policy NYU
blocks all ports except Web, and some FTP access. So, I cannot access my home
or web host from school. After negotiating the appropriate levels of
bureaucracy, including having a form signed by my advisor and the financial
dean of my school, I now have a shell account, which I think will permit me
to SSH out of NYU!
this entry posted to
career;
comments (0)
2005 Apr 30 | Epistemic stances
When it comes to making claims, about reality or anything else really, a
number of different stances might be adopted by the speaker:
- Objectivity: the claims have a correspondence to reality; they are
typically embedded in a framework by which their validity is affirmed.
For example, the scientific method posits mechanisms that mitigate errors
common to human perception and therefore affirms its aspiration towards
objectivity. Problems of this stance include the fact that the sort of
claims one makes, and the questions one asks, are personally or socially
influenced without such methodological bracing. Also, the appearance of
objective methodologies can be easily mimicked.
- Neutral: the claims are satisfactory, or at least mutually
unsatisfactory, to the claims' constituencies. For example, when the
press represents an issue by first finding the two extremes of the
argument, they have atleast not favored one of those extremes. Problems
of this stance include that the constituencies may have not be accurately
represented, both with respect to their positions and relative
numbers.
- Transparent bias: no pretense to objectivity, nor in accommodating
various constituencies, but plainly representing the speaker's bias. For
example, the blogger who simply writes what they think. Problems of this
stance include that this stance is often misperceived as one of the other
two, and that it includes no inclination towards finding common ground
with others.
this entry posted to
culture;
comments (0)
2005 Apr 16 | Encrypted Files Systems
In moving to Kubuntu 5.04 from a
Knoppix install, my loop-aes partitions are no longer readable. Since crypto-loop is
being deprecated anyway, I thought I would try the dm-crypt. However,
because I would have to employ that on top of a file loop, it's a hassle.
Fortunately, I bumped into EncFS. Generally, I like
it a lot and is comparable to crypto-loop except when it comes to a USB
drive. A copy of a 2GB file to a
- a normal IDE partition: ~17MB/s,
- an external vfat USB2 drive (ehci_hcd): ~11MB/s,
- an encfs directory on the IDE drive: ~7MB/s,
- an encrypted directory on the external drive: 64KB/s.
Ouch!
Interestingly an ext3 formatted loop device (no encryption) on the
external drive is ~17MB/s and with crypto-loop it is ~10MB/s. Now, here's
the real kicker: an ext3 loop partition sitting on the vfat external drive,
hosting an encfs directory is ~14MB/s! So, vfat sucks -- though encfs on vfat
aprobably doesn't have to do quite so poorly. Or to put it another way, it's
faster to put a 3GB file on the external vfat drive (vfat is very compatible
with many computers), mount it as an ext3 device loop, and run encfs on top
of that than it is to access the plain old vfat file system. (This is even
slightly faster than running encfs on the local IDE drive!)
this entry posted to
technology;
comments (0)
2005 Apr 04 | Lost Minorities
In a discussion on the possibility of limiting anonymous edits to the
Wikipedia -- resulting from the mad dash of unwelcome activity during April
Fool's day -- a participant replied "This one comes up again and again, and
the consensus has always been 'nope, anon edits will continue thanks.'" I am
always cautious before making such a declaration of consensus because of what
I call the "lost minority." When a particular policy of the community is
likely to drive away some participants one must be careful. I explain this by
way of a group within a room of which some folks have bad gas. Those with the
gastronomical distress, of course, don't complain. There are some, who aren't
bothered by the smell, or think that the smell is a worthwhile trade-off for
the presence of the farters. And those with sensitive noses simply can't take
it. When the latter group propose that something be done about this issue,
such as limiting the amount of broccoli that is served at lunch, or
developing a nose filter, people discuss the issue, and perhaps it gets added
to an agenda. In time, the group is polled, and surprisingly the consensus is
to do nothing. But, what is happened is that a significant portion of those
with the sensitive noses have already left.
I don't know if this is the case in this Wikipedia example, but this
concern is relevant to issues that are likely to drive participants away. And
in time, that lost minority might become an alienated majority.
this entry posted to
culture;
comments (1)
2005 Mar 31 | Early cooperative norms
Just ten years after the first electronic computer was built, and two
years after the release of IBM's first mainframe, a collaborative association
was formed by which IBM customers could collaborate, called Share. The
following is reminiscent of the many the Wikipedia dictates on keeping a
humble and open mind:
They asked the originator of each program to assume responsibility for
making and distributing prompt corrections. In a statement titled "Share
Membership and What It Entails," adopted in February 1956 at the fourth
general meeting (table 2), all representatives were reminded: "The
principal obligation of a member is to have a cooperative spirit. It is
expected that each member approach each discussion with an open mind, and,
having respect for the competence of other members, be willing to accept
the opinions of others more frequently than he insists on his own." (Akera
2001:719)
this entry posted to
culture;
comments (0)
2005 Mar 11 | Mythbusters and Buttered Toast
A segment on tonight's MythBusters
addresed the question of "whether buttered toast falls buttered side up or
down more often?" This is one of my favorite daily puzzles that can be
addressed by a basic understanding of experimentation and statistics. My own
curiosity on this question was satisfied by a segment of Newton's Apple -- if
my memory is correct -- which found that it is the typical height of the
table surface which determines the, originally, upward facing side falling on
the floor. Pushing the toast from a ladder completely reversed this trend as
the toast could tumble a full 360° and land in its original orientation:
buttered side up.
Yet, notice the MythBusters question: it asks if toast being buttered
effects how it ends up -- regardless of its original orientation, even if
that is buttered up in most all daily cases. So first they had to find a way
to drop toast in an unbiased way independent of the original orientation. Not
surprisingly, Adam found that pushing it from the table was not satisfactory
on this note. Eventually they developed a machine that dropped unbuttered
toast landing up 11 times, and down 13 times -- orientation was determined by
a magic marker X which we must assume is unbiasing. It is reasonable to
conclude that 11 up and 13 down is indicative of a "fair" mechanism. Now when
they buttered a side of 24 slices of toast they also found 12 up and 12 down.
These sample sizes are too small, but roughly, it does not appear that the
butter had any effect!
However, when they drop the toast from a two-story building (27'5") and
find that the dry toast side X lands up 26 out of 48 drops (54%) and the
buttered side X lands up 29 out of 48 drops (60%), Jamie posits that the 6%
discrepancy is because he could see that the buttered side had a concave
impression, and like a leaf, the convex non-buttered side tended to fall face
down. Adam concludes, "if you really want to ensure, in general, you're toast
landing buttered side up or down, we can tell you, you should butter with a
good vigor and that the resultant bowl will make your toast generally fall
butter side up." However, though he "generally" qualified his statement,
strictly speaking, it is not statistically supported and when Jamie is
offering a mechanism for a perceived statistical finding, he is premature.
(However, if he is offering a simple observation, that's all it is.)
In this case, the null hypothesis is that the difference between the dry
54% and the buttered 60% is just due to chance. (Or, if we were to repeat the
experiment, it's probable that a similar skew would happen.) The alternate
theory is that there is some causal mechanism (i.e. the bowl shaped
impression) that affects the outcome. If we can show that there is a low
probability of repeating the experiment and observing a similar significance
of difference (6%), that implies support for the alternative hypothesis.
Unfortunately, neither test alone is statistically significant. For example,
the probability of getting 29 out of 48 drops buttered side up even on a fair
coin is 8.5 %.
z = (observed - expected) / StandardError
z = (29 - 24) / Sqrt(48)*Sqrt(.5*.5) = 1.445
=> P = 8.5%
The random chance of getting 26 buttered side up his 27%.
The probability that the difference between getting 26 in the "dry"
control case, and 29 in the buttered case also is 27% and not significant.
z = (observed - expected) / StandardErrorofDifference
z= ((60%-54%) - 0%) / Sqrt((SEdry)^2 + (SEbuttered)^2)
z=6% / Sqrt(7.19^2 + 7.07^2)% = .5950
=> P = 27%
this entry posted to
technology;
comments (0)
2005 Mar 03 | Relational artifacts
In a small seminar Sherry Turkle spoke of her work on relational
artifacts: devices that present as having a mind of their own so as
to inspire an emotional and nurturing response in humans. What are the
societal implications when children become dependent on them, or we use them
to keep the folks in nursing homes happy? Is machine presence preferable to
loneliness? Yes, it might be better if we talked to children about the
bumpiness of human friendships (e.g. learning to share) and the cycle of life
(e.g. a biological pet's death), or provide real companionship, support, and
community to elders, but absent that, are robots okay?
Turkle refuses to answer this question as stated and challenges the
assumptions as a matter of principle -- she also recommends that machines
acts like machines, but what this means is also problematic.
Yet, if we do attempt to answer the question directly, it reduces to the
problem of the "happy box." If Tyrell Corporation could manufacture a box to
which one could plug-in and become contentful, should we permit it? This is a
massively complex and interesting question, the subject of much science
fiction and anime stories, and even encountered in the question of should Buddha have taken
Prozac?
In the end, as in anything, I suppose it comes down to the personal choice
of consenting individuals and society should maintain a commitment to make
the real world a worthy alternative to simulated bliss.
this entry posted to
culture;
comments (0)
2005 Mar 01 | Usage and citation
There are two oddities associated with concerns about the Wikipedia in the
school setting, when teachers do not permit the students to use Wikipedia as
a source.
The first difficulty is that the Wikipedia is often compared to the
Encyclopedia Britannica, a resource I was not allowed to cite in highschool,
though it is now presented as the gold standard of authority relative to the
messiness of the Wikipedia. The second is that students will use the
Wikipedia -- just as I used the Encyclopedia Britannica in my youth and
continue to use reference works today. (In fact, I think highly of professors
who can provide a good reference work or textbook for their domain, and
poorly of those who can't or refuse to because they feel their topic cannot
be "reduced.")
Even in high school, when confronted with a rule prohibiting the citation
of a reference work I felt as if I was being encouraged towards plagiarism,
or at least unfairness. If a reference work points me to a more authoritative
source, should I at least not acknowledge this bit of help? Particularly, if
I'm more likely to be influenced by the summary provided by the reference?
Additionally why would any book among the thousands published a year be any
more authoritative than a general reference work on the sole basis of its
form? I could compile a multipage bibliography of books denying the
Holocaust, but find few -- if any -- general-purpose reference works that did
the same. The generality of the reference work insulates it from partisan
pressures because it must appeal to a wide audience over many topics. It is
unlikely that neo-Nazis would publish a useful general reference work for the
sole purpose of shifting articles on Jews towards their perspective. However,
this is not to say that reference works have no bias. Only, that if we look
at the formal genre of a text only -- which is what this rule does -- any
given reference work is less likely to be "eccentric" than any book taken at
random.
Finally, when one considers in what direction the authority flows, books
are often demonstrated as authorities by being cited by the encyclopedia! Or,
at least, encyclopedias imparts as much authority to the books they cite, as
they obtain in citing them -- this transfer is mediated by the reputation of
the publisher, editor, and contributors.
this entry posted to
culture/wikipedia;
comments (0)
2005 Feb 14 | XML ElementTree Data Model
I've been playing with Fredrik Lundh's ElementTree as an
intuitive/pythonic way of processing XML. (While I like Aaron Swarz's XML Tramp, it
is presently unsupported; ElementTree is fast and has XPath
support.)
ElementTree Conventions
this entry posted to
technology/python;
comments (1)
2005 Feb 11 | Abstractions: the benefits and dangers
Why is abstraction such a powerful technique in computer science, but a
potentially befuddling one in the social sciences? In computer science,
abstractions permit one to:
- increase the scope of the construct being abstracted
- hide unnecessary details
- cleanly define object/functions and their interfaces
Yet, in social science it:
- increase the scope of the construct being abstracted
- hides details — but how do we know whether those details were
important?
- often has muddled definitions with few cleanly define interfaces
The potential danger is that abstraction in the social sciences —
often spoken of as "theory" though I prefer to think of theory in this
context as a metaphysic or framework — is that it increases the scope
of action while possibly becoming muddled and detached from reality. With
greater scope, comes a greater responsibility to clarity.
Consider, that one rarely hears of a computer abstraction being overly
broad because it is necessarily grounded in the reflexive practice of usage.
(If it's broken, it'll get fixed.) For example, the Python sort function
takes an object to be sorted and returns a similar, sorted, object. We need
not be overly concerned with the algorithm or its implementation.
Furthermore, as new data structures are developed, one can pass it a specific
compare function relative to that data structure. We get to re-use the
concept of "sort" with enough specificity as needed — but no more.
Social science "theorists" tend to resist attempts to cleanly define
words, to explicitly model a construct, or be "reduced" by attempts at
clarity and coherence. When I asked a student about the meaning of a word he
used because I could not find it in the dictionary he replied, "that's your
mistake, don't look these words up." Oddly enough, Bourdieu's most useful
work is in an extended interview — as is McLuhan's Playboy
interview — though they both fear being overly reduced by this form of
discourse. Consider how this differs from the following well-known
anecdote:
Richard Feynman, the late Nobel Laureate in physics, was once asked by a
Caltech faculty member to explain why spin one-half particles obey Fermi
Dirac statistics. Rising to the challenge, he said, "I'll prepare a
freshman lecture on it." But a few days later he told the faculty member,
"You know, I couldn't do it. I couldn't reduce it to the freshman level.
That means we really don't understand it."
If only we were so humble in the social sciences — and Feynman was
not known as a humble man; perhaps he figured that if he couldn't do it, then
no one could.
this entry posted to
culture;
comments (0)
2005 Feb 11 | A new model of design
First there was designers design. Then there was user-centered design. But
even that, is now behind the times. Given open development paradigms
and "extreme programming" practices, there is another approach to design,
which is no "design." Instead one has frequent, incremental, releases that
the user environment can immediately send feedback upon. This takes place in
a context, not of the firm, but of a larger ecology where each design is
subject to evolutionary pressures.
For example, at the W3C we would sometimes despair that a project might
take years, because of all the folks involved, and not satisfy requirements
in the end anyway. In fact, sometimes the requirements would be better
satisfied by lightweight/hacker constituencies external to the W3C.
Consequently, upon my noting this, Tim Berners-Lee suggested that we develop
projects by a "red team/blue team" model. Instead of having one group of 40
engineers working for a couple of years on a design that may not be
well-liked at the end, one creates two smaller teams to specify and prototype
proposals within a matter of months. And then someone, either a meritocratic
figure or the community at large, can identify which one is superior. The
problem is, within the firm, it is very difficult to do this because of
political reasons: no one wants to be on a losing team. (In the case of a
large project, everyone is on the same mediocre team at least.) But in the
context of open development, there are often many competing projects. Some
continue on in seeming redundancy, and others die quickly -- and I expect the
developers are even happy about that so they came move on to something more
fruitful.
this entry posted to
culture;
comments (0)
2005 Jan 06 | Epistomological Authority
Two recent discussions have prompted me to return to question of epistemological
authority. In the case of the online collaborative Wikipedia, Larry Sanger, a
founding participant, lamented the
inability of the community to accept and retain contributions from "experts."
Also, creationists have re-factored their doctrine into a pseudoscientific
"theory" of intelligent design
and advocate that it be taught alongside, or instead, of evolution. I
believe both of these cases share the conditions that there is such a thing
as expertise, but that all views are potentially ideologically biased. (And
intelligent design
on the Wikipedia is an example as well.) Can the community at
large distinguish authoritative arguments, or must we be cynical and believe
that all arguments are biased but some are only more eruditely presented? (In
fact, I've realized that the bulk of continental social "theory" is about
identifying such biases: Boudieu's doxa and symbolic violence, Hall's
naturalization, Gramsci's hegemony, Marcuse's and Adorno's technological
veil, Weber's symbolic violence, Foucault's episteme, Barthes' exnomination
('unnaming') etc.).
In An Introduction to Reflexive Sociology Pierre Bourdieu (1992)
discusses a couple of his conceptual contributions which may be of use in
understanding these debates. A field is a cultural domain in which
participants have a stake in and compete with each other for the accumulation
of some sort of capital (i.e., social capital).
Like any social universe, the academic world is the site of a struggle over
the truth of the academic world and of the social world in general. Very
rapidly, we may say that the social world is the site of continual
struggles to define what the social world is; but the academic world is a
peculiarity today that its verdicts and pronouncements are among the most
powerful socially. In academia, people fight constantly over the question
of who, in this universe, is socially mandated, authorized, to tell the
truth of the social world (1992:70).
One of Bourdieu's preferences is that fields be true to themselves and
operate autonomously and in a "scientific" manner.
A scientific field is a universe in which researchers are autonomous and
where, to confront one another, they have to drop all nonscientific weapons
-- beginning with the weapons of academic authority. In a genuine
scientific field, one can freely enter free discussions and violently
oppose any contradictor with the arms of science because your position does
not depend on him or because you can get another position elsewhere.
(1992:177)
My own understanding is that scientific does not equal academic: academic
authority is based on a hierarchical application of judgment to those who
allegedly know less; while closely associated with the academic, scientific
assessments should be discernible to those who know the same or even less.
Above, Bourdieu introduces the notion of "scientific arms": legitimate means
of dispute. In Jonathan Sarfati's response to the
creationist book Teaching About Evolution, he notes that the
creationists claim that the National Academy of Sciences "resorts to
arbitrary, self-serving 'rules' to determine what qualifies as 'science' and
what doesn't." Of course, and presently in America we have the confounding
situation that a great majority of the members of the National Academy of
Sciences accept evolution, but a frightening proportion of Americans
don't.
A field is all the more scientific the more it is capable of channeling, of
converting unavowable motives into scientifically proper behavior. In a
loosely structured field characterized by a low level of autonomy,
illegitmate motives produce illegitimate strategies and, furthermore,
strategies that are scientifically worthless. In an autonomous field such
as the mathematical field today, by contrast, a top mathematician who once
to triumph over his opponents is compelled by the force of the field to
produce mathematics to do so, on pain of excluding himself from the field.
Being aware of this, we must work to constitute a Scientific City in which
the most unavowable intentions have to sublimate themselves into scientific
expression. This vision is not utopian at all, and I could propose a number
of very concrete measures designed to make it come true. For instance,
where we have won a national referee or evaluator, we can institute an
international panel of three foreign judges (of course, we must then
control for the effects of international networks of mutual knowledge and
alliances). When a research center or a journal enjoys a situation of
monopoly, worked to create a rival one. We can raise the level of
scientific censorship by a series of actions designed to upgrade the level
of training, the minimal amount of specific competency required to enter
the field, etc.
In short, they must create conditions such that the worst, the meanest, and
the most mediocre participant is compelled to behave in accordance with the
norms of scientificity in currency at the time (1992:177).
Interestingly, Bourdieu is advocating censoring "nonscientific" claims.
Which, while not very democratic, can be meritocratic — though I think
his proposals for committees implausible. Yet, while Bourdieu is sympathetic
to the autonomous operation of a field he does not want to focus on a
particular methodology or bureaucracy, but the almost anarchistic competition
under an already agreed to metaphysical system.
There is in history what we may call, after Elias, a process of scientific
civilization, whose historical conditions are given with the constitution
of relatively autonomous fields within which all moves are not allowed, in
which there are immanent regularities, implicit principles and explicit
rules of inclusion and exclusion, and admission rights which are being
continually raised. Scientific reason realizes itself when it becomes
inscribed not in the ethical norms of a practical reason or in the
technical rules of scientific methodology, but in the apparently anarchical
social mechanisms of competition between strategies armed with instruments
of action and of thought capable of regulating their own uses, and in the
durable dispositions that the functioning of this field produces and
presupposes. (1992:180)
But, in the case of the creation/evolution debates, what is at stake is
the metaphysical system of judging what is and is not science; in Wikipedia,
what is and is not good, neutral, and authoritative content? Creationists
object to natural science as a baised metaphysical system, or even a
religion, like their own supernatural literalism. It is at this point, that I
find their position simply incoherent and can no longer sympathetically
engage in the debate. The divine, supernatural, and the ineffable may
exist and be revealed to some, but these are not legitimate discourses in
a public sphere in which others do not have access to the inspired source.
The alternative that I can understand is as Robert Pennock (2001:84) wrote in
Intelligent Design Creationism and Its Critics, "The methodological
naturalist does not make a commitment directly to a picture of what exists in
the world, but rather to a set of methods as a reliable way to find out about
the world — typically the methods of the natural sciences, and perhaps
extensions that are continuous with them — and indirectly to what
those methods discover."
As I discussed in Scandal and The Politics of Science and Vice
Versa, "We can never know everything. We haven't the capacity nor time to
give informed consideration to every important issue. So we rely upon labels
and personalities to set the default values of our opinion." A claim of
authority is a claim of being worthy of being deferred to. In the case of
Wikipedia, if people are to accept it as an Encyclopedia, it seemingly must
prove itself as an authority being worthy of being deferred to. Such proxies
are often determined by the judgments of peers, judgement of superiors,
method, majorities, personal experience, and results. And the difficulty with
both the Wikipedia and debate on evolution is that the best method, results,
is not immediately apparent. If we stop teaching evolution now, the effects
would be long-term and confounded with many other social variables. And how
does one "objectively" judge the quality of Wikipedia?
Two of the key differences between Wikipedia and open source software
development are that with questions of protocol and code one can easily make
authoritative claims based on the results, and consequently such communities
tend to be meritocratic. As I wrote in Why
the Internet is Good, "With the cacophony of ideas, proposals, and
debates, and a lack of a central authority to cleave the good from the bad,
how does one sort it all out? It sorts itself out. ... The success of any
policy is based simply on its adoption by the community." Encyclopedia making
is not so fortunate, and Wikipedia strives to be more open, accepting
anonymous contributions even, than most all open source projects. Nor can we
simply rely upon the naked authority of expertise and academia: expertise
should be supported, but to be accepted the results of expertise must also be
widely perceptible to the larger public.
this entry posted to
culture;
comments (0)