2008 Feb 06 | Digital Posterity
I have over 1000 primary sources in my Wikipedia research
mindmaps. In
accumulating some of those sources, I have already been confronted with
their
ephemerality. (And these are public sources only; I know lots of
e-mails I
would've liked to have access to by the likes of Wales, Sanger, and
Stallman
that apparently no longer exist.) So, doing a quick check-link analysis
of
the largest mindmap I find the following: 941 of those resources are
"OK"; 21
are "404" (no longer there); and 10 "Timeout". So, just within a few
years
~2% aren't readily available. For example, the link to Sanger's 2005
information about his (then) new Digital
Universe project is
already
broken; but I must say news sites are the worst. Then, there are the
URLs
that don't have what they use to, those that are now password
protected, and
those that have new URLs because of a site reorganization -- blogs seem
to be
the worst on this front. Of course, I don't know if this rate is a
linear
trend and I would be interested in any research that shows longitudinal
decrepitude rates of an existing corpus of links.
In any case, I expect my own modest historical inquiries are
only the
beginning; I think people will be writing histories of Wikipedia and
the
larger free culture movement decades in the future, though I am not
sure how
much of what we have today was still be there for them. I was
surprised, and
happy, to find that someone else is already making use of my Nupedia-l
archive, so I thought it would do something similar for my
other sources.
I don't think this would be of much use to anyone today, and is
somewhat
"tainted" in that it is my own analytical take and selection of sources
--
absent summaries, annotations and excerpts -- but it might be of use in
the
future.
This archive includes the HTML versions of two mindmaps and a
copy of the
online resource to which they link to. If you do make use of it, you
can
continue to refer to it as part of the "Reagle Wikipedia Archive."
This collection wp-sources.tar.bz2 was
made by placing the HTML
version of the mindmaps (wikip-primary.html and field-notes-cat.html) on a Web server and then issuing:
wget
--restrict-file-names=windows -c --recursive --level=1 --span-hosts
--convert-links --execute robots=off -t 4
http://reagle.org/joseph/2008/02/wp-srcs/field-note-cats.html
this entry posted to
method;
comments (2)
2008 Jan 24 | Too magnificent
I recently read Andrew
Ross' "No Collar: The Humane Workplace and Its Hidden Costs: Behind the
Myth of the New Office Utopia" in remembrance of my own brief time as
consultant in New York's "Silicon Alley" during the booming 90s. I even had a
few meetings at Ross' study site: the ever-cool design/strategic/Web firm RazorFish. I like
Ross' portrayals of American culture, including the "Celebration Chronicles:
Life, Liberty, and the Pursuit of Property Value in Disney's New Town," but I
encountered him first in the Sokal Hoax affair -- and
was disappointed with his defense of accepting a Po-Mo goobly-gook hoax
submission to the prestigious Social Text journal. I expect I am sympathetic
to Alan Sokal in this affair because as a former computer scientist I've been
acculturated with the maxim of K.I.S.S.: Keep It
Simple Stupid. This sensibility persists into my engagement with
humanities and social sciences -- though it sometimes causes me to feel
alienated and distressed. Similarly, I've always been fond of physicist Richard Feynman's
freshman principle regardless of the discipline: if something can't be
explained in a freshman lecture, it is not yet well understood. (He was quite
a character, and is also alleged to have said that the philosophy of science
is as useful to scientists as ornithology is to birds.)
So reading Ross again prompted me to peruse his Wikipedia article, which
then led to the Sokal hoax article, and then to a wonderful
list of similar hoaxes in other disciplines, including the fascinating
tale of the Bogdanov
brothers:
The Bogdanov Affair is an academic dispute regarding the legitimacy of a
series of theoretical physics papers written by French twin brothers Igor
and Grichka Bogdanov (alternately spelt Bogdanoff). These papers were
published in reputable scientific journals, and were alleged by their
authors to culminate in a proposed theory for describing what occurred at
the Big Bang. The controversy started in 2002 when rumors spread on Usenet
newsgroups that the work was a deliberate hoax intended to target
weaknesses in the peer review system employed by the physics community to
select papers for publication in academic journals. While the Bogdanov
brothers continue to defend the veracity of their work, many physicists
have alleged that the papers are nonsense, considering this evidence of the
fallibility inherent within the peer review system. The debate over whether
the work represented a contribution to physics, or instead was meaningless,
spread from Usenet to many other Internet forums, including the blogs of
notable physicists and both the French and English Wikipedia encyclopdia
projects.
While perhaps not as common, the natural sciences too can suffer from
incomprehensibility masquerading as erudition. In fact, some of the worst
excesses in the humanities go hand in hand with speculative takes on
cosmology and quantum physics. And, I am putting aside the interesting issues
of the efficacy of peer-review and the extent to which a discipline can trust
its members not to flat out lie -- such as the case of disgraced Korean stem
cell researcher Hwang
Woo-suk. My main point here is to the extent that we should strive, and
hold others accountable to, a standard of simplicity, or as Einstein said "as
simple as possible, but not simpler."
For my own purposes I've come to view that which is incomprehensible to me
as perhaps like medieval Scholasticism --
famously parodied with the question of "how
many angels can dance on the head of a pin?". Thomas Aquinas and Peter
Lombard were no doubt far smarter than me, but if one starts with particular
set of assumptions (e.g., textual inerrancy), fetishize a logic over a
broader rationality (e.g., dialectics, be
they Christian or Marxist), and lack an understanding of how we fool
ourselves (e.g., confirmation bias)
I feel we can end up with brilliant nonsense. (Frederick Crews is famous for
his criticism of Freudianism along these lines and his latest book is aptly
titled "Follies
of the Wise.")
And I now have a new term for describing those works that are manifestedly
learned but for which I'm confused as to whether I'm too dumb to understand
or they are simply incomprehensible. Herman Kogan (1958), in the "The Great
EB: the Story of the Encyclopaedia Britannica," writes of the Britannica's
editors difficulty with the "Algebraic Forms" article which was so complex
that it was referred to different experts to assess whether it was sensible.
In the final review, Simon Newcomb of Johns Hopkins University wrote, "It's
magnificent, although I am not sure it is all clear to me but it's really
magnificent." Consequently, the editor rejected the article as being "too
magnificent" (p. 90).
this entry posted to
method;
comments (1)
2007 Aug 13 | Source-as-primary-character
I recently finished Peter Heather's (2006) The
Fall of the Roman Empire. This popular, though no less rigorous, history
is widely praised. The narrative is engaging and I appreciate the glossary,
dramatis personae, and timeline; these help given the scope of the book spans
150 years, dozens of emperors (East and West), generals, and barbarian Kings.
What most impressed me was Heather's treatment of sources. Many histories,
particularly of ancient societies, are written in the third person objective.
Yet, as I learned in my historical methods course, the practice of history is
more than a recounting of events, but a substantiated argument about people
and events in time. Heather presents his arguments as such: identifying when
he agrees or disagrees with others or scholarly consensus, and addressing the
circumstances of his sources. Rather than being simply a footnote, sources
come to the foreground and become part of the story. A history of the source,
such as Pullodius' commentary on Ambrose written in the margins of De Fide,
or the listing of fourth century military and civilian offices, the Notitia Dignitatum,
are interesting in themselves and contribute to a much deeper understanding
of the ground on which Heather's arguments rest. While a popular history
might present a more accessible or exciting version of an old tale, it is
rare for it to communicate the challenges and excitement within the
discipline -- because popular history often obscures its scholarship. But
Heather brings it forward and what I thought might be a rather staid field --
don't we already know all we can do about the ancients? -- is shown as alive
with new archaeological finds, textual fragments, analysis, and argument.
I know this will influence the next revision of one of my historical
chapters with respect to how I speak about some of the primary sources I
found.
this entry posted to
method;
comments (0)
2007 Mar 15 | Reuse vs. self plagiarism
Yesterday's New York Times reported on another case of high profile plagiarism: a relatively young professor who had copied parts of her dissertation from another. Even though she had previously acknowledged as much in private and has now resigned -- so there's no question of ambiguous boundaries -- a few things struck me as salient:
- A colleague, fed up with low-quality peers, started the investigation and even hired a private investigator to bust her.
- Before the story broke she contacted her former adviser about revising copied material, but when questioned by the reported the adviser responded "He said that he barely recalled her, 'I only remember one thing, that she was in a hurry?'" That seems odd.
- The article's evidence of plagiarism are two fragments: a short description of a reference, and a one sentence description of her project. I find both types of statements lend themselves to a remarkable degree of homogeneity, and wouldn't find them convincing on their own among hundreds of pages of text.
Interestingly, this article came along at the same time I have been following an interesting discussion of turnitin, a student plagiarism detection service, and struggling with issues of "reusing" my own work.
Unlike my previous issue of how to deal with priority in relation to self-published grey literature, my present concern arises out of published work. I am presently working on Chapter 4 of my dissertation which specifies criteria for an "open content community" as well as some interesting boundary cases on openness. A dissertation is, understandably, supposed to be an original work; I read this as "new work since matriculation" but I have heard it said this could mean only unpublished work: this would be horrible. Perhaps my strong view is partly because I'm a "mid-career" Ph.D. student that has already presented papers and it strikes me as contrary to stop a professional activity that is essential for getting feedback. I also appreciate new Ph.D.'s reuse their dissertation in subsequent articles and/or books, which I also plan to do. But to sit on all that material and labor on in solitude -- aside from one's dissertation committee members -- until the dissertation is complete seems counterproductive. Consider the genealogy of parts of the present chapter:
- the notion of an open content community was a fragment from a sociology term paper from four years ago that I then developed into a short published paper, which I then expanded into a more extensive published paper that I planned to make use of in this chapter.
- the boundary case of open community and closed law was a blog entry I planned to use in this chapter. I received a request for its use in a book and it is now "published."
- the case of WikiChix was another blog entry I planned to use and was happy to extend and submit it in response to a request since I knew I'd eventually want to work on it more; it will soon be published.
- the case of a Wikipedia blocking proposal, now implemented, is written and no one else has ever seen the text. Consequently, I expect it stinks and I thought of posting here.
In no case did I assign any copyright -- though they of course are published under various copyright licenses -- and so I am not legally precluded from using them in compilations or derivative works. Making them available has provided me with feedback and opportunities for publication which yields more feedback and builds relationships within my scholarly community. This is great! But what of "self-plagiarism"? (So, perhaps this is like my earlier post but questions of priority and public but "unpublished" work are exchanged for questions of "published" works and self plagiarism.)
I've been reading up on the topic and found Green (2005) interesting, and Hexham (1999) useful:
Self-plagiarism must be distinguished from the recycling of one's work that to a greater or lesser extent everyone does legitimately. Although self-plagiarism in academic publications is a gray area many universities implicitly recognize the practice as fraudulent. Thus most universities have rules preventing students from submitting essentially the same essay for credit in different courses. There are also rules against someone submitting the same thesis to different universities. Among established academics self-plagiarism is a problem when essentially the same article or book is submitted on more than one occasion to gain additional salary increments or for purpose of promotion.
Like all plagiarism, self-plagiarism occurs when the author attempts to deceive the reader. This happens when no indication is given that the work is being recycled or when an effort is made to disguise the original text. The issue once again is one of deception. Disguising a text occurs when an author makes cosmetic changes that make the same book or paper look different when it actually remains unchanged in its central argument. Changing such things as paragraph breaks, capitalization, or the substitution of technical terms in different languages, causes readers to believe they are reading something completely new. If these are the only changes an author has made then they may be legitimately described as self-plagiarism and fraudulent.
The extent of re-cycling is also an indication of self-plagiarism. Academics are expected to republish revised versions of their Ph.D. thesis. They also often develop different aspects of an argument in several papers that require the repetition of certain key passages. This is not self-plagiarism if the complete work develops new insights. It is self-plagiarism if the argument, examples, evidence, and conclusion remain the same in two works that only differ in their appearance.
Which brings me, finally, to my simple and mundane question for my dissertation. Is a citation to my own published works sufficient if I am reusing text -- though continuing to rework and integrate it -- or should I also give an acknowledgment often seen in scholarly books that "portions of this text are republished from or based on...")?
(BTW: a possible irony is I expect this and earlier entries could be turned into a decent paper on "scholarship in the open" should the opportunity ever present itself!)
this entry posted to
method;
comments (2)
2007 Feb 13 | Grey
literature, stigmergy and priority
Last
week I read a provocative paper by Helen Nissenbaum (2002) where she
considers the norms, values, and ends previously served by the
convention of scholarly priority, and, now that the contextual
landscape is changing because of electronic media, whether intellectual
property (patents) can serve just as well in their stead. Helen
recommended it to me while we were discussing my dissertation chapter
on encyclopedic production, including questions of copyrights and
plagiarism. This chapter is partly based on a draft I wrote in 2005 in
which I argued the concept of stigmergy is helpful in understanding the
sort of socialty involved in the cumulative production of knowledge in
reference works.
An irony is that Nissenbaum's paper speaks to the question of
scholarly priority in the age of the Internet, which bears on my
adoption of the term stigmergy. (She doesn't mention blogs or wikis,
but instead refers to "wildcat publishers," "grey literature," and
whether there is any scholarly obligation to search these realms for
the purposes of citation.)
I think I first wrote of stigmergy in the spring of 2005, in a
draft I made available on this blog on September
30. Roughly a year later, I read Mark Elliott's piece Stigmergic
Collaboration: The Evolution of Group Work in the May issue
of the online MC/ Journal. Elliott explores the idea much more
thoroughly than I did or will, and that is good. But how do I deal with
the question of priority and citation? I definitely want to -- and do
-- cite Elliott in my present version of the chapter, but what to do
with my earlier version? I don't know Elliott and assume he knows
nothing of me. And I don't feel that proprietary about saying Wikipedia
might be stigmergic. And for all I know we read the same thing about
wasps -- though I was also inspired by early reference work compilers
likening their copying of others' work to a useful "busy bee." But I
don't want it to appear I am simply borrowing the idea from elsewhere
and I prefer not to cite earlier "unpublished" drafts. This concern
with priority is in the face of the biggest irony of all: an argument
of this chapter is that knowledge is inherently interdependent and
cumulative!
Presently, the text in question reads:
Stigmergy is a term coined by Pierre-Paul
Grasse to describe how wasps and termites collectively build complex
structures; as Karsai (2004:101) writes, it "describes the situation in
which the product of previous work, rather than direct communication
among builders, induces [and directs how] the wasps perform additional
labor." In addition to my proposal that this notion might be helpful in
understanding Wikipedia collaboration (Reagle 2005fss), Mark Elliot
(2006) has also, more thoroughly, argued the same: "As stigmergy is a
method of communication in which individuals communicate with one
another by modifying their local environment… [t]he concept
of stigmergy therefore provides an intuitive and easy-to-grasp theory
for helping understand how disparate, distributed, ad hoc contributions
could lead to the emergence of the largest collaborative enterprises
the world has seen" (Elliott 2006:4). However, we need not apply this
notion only to new media. For example, stigmergy might also be
applicable to Newton’s seemingly generous sentiment of
acknowledging the contributions of his predecessors: "If I have seen
further [than you and Descartes] it is by standing upon ye shoulders of
giants." (As cited in a 1676 letter from Newton to Hooke, by Merton
(1993), who details a long history of this aphorism and Newton's
probably less than magnanimous intention (Hawking 2002) of insulting
Robert Hooke, his short and hunchbacked rival.)
Is this appropriate?
this entry posted to
method;
comments (0)
2007 Jan 23 | Broken lists
I'm presently cursing whoever changed the configuration/names
of Wikipedia lists. Identifying emails in archives is sadly a difficult
problem, it really need not be, but fortunately the good folks at the
aimsgroup MARC also archive the lists and associate the unique identifier of every
message with a persistent and unique URL, as I wrote about previously.
But when Wikipedia moved its lists from "foo@wikimedia.org" to
"foo@lists.wikimedia.org" it not only broke email filters across
the land, it broke the MARC archives evidently. No message is
available in the MARC archive since the change, on January 6. Now, Wikipedians are realizing
that many of the links from the Wikis to email messages (e.g.,
referencing a message on the Wikimedia Foundation list) are broken.
My
backlog of email messages to scrutinize is growing as I hope Hank
Leininger and the other volunteers at MARC find the time and means to
address the problem. What would be great is if Wikipedia and other
users of archive software (i.e., mailman) pressed for stable references
to messages as a priority feature!
this entry posted to
method;
comments (4)
2006 Oct 20 | A note on bibliography
I'm sharing this note from the beginning of my dissertation so others
working with online resources might comment.
The type and number of bibliographic sources of this work merit a couple
comments.
First, most of the primary sources are online, and have only been
online. Quotations from e-mail and most exclusively online resources have
no page numbers associated with them.
Second, many of the printed sources (primary and secondary) are now
online. This is common in recent works where authors place versions of a
print publication online, or where older works are now in the public domain
and have been republished online. In such cases I use the publication date
of the version I used. If necessary, I include the original publication
date in prose adjacent to the reference, and I include it in the title of
the work in the bibliography. For example the bibliographic entry for
Project Gutenberg's 2004 republication of H. G. Wells' "A Modern Utopia"
would be:
Wells, H. (2004). A modern utopia (1905). (6424). Retrieved on
September 20, 2006 from < http://www.gutenberg.org/dirs/etext04/mdntp10h.htm
>.
The page numbers associated with print-only sources obviously correspond
to the printed page. For those sources that are also online, the page
number might be associated with the pagination of the printed online
resource from which I first took my notes, or the printed material, for
which I later found an online copy. I believe it will be clear to the
reader which is the case.
Third, for some recent sources, there are many publications by the same
author in the same year. After a couple of years of experimentation with
the software
I use to manage this material I have settled upon the convention of
identifying such a source by appending a token to the publication year that
is composed of the first three substantive words of the title. So, instead
of using the letters [a-z], which some bibliographic systems use, my
reference for Wikipedia's "Neutral Point of View" article is: (Wikipedia
2006npv). This provides stability across additions/subtractions to the
bibliography and across chapters, and is comprehensible to the author and
hopefully the reader.
Finally, Web sources do change, particularly Wiki pages! Wherever
possible I include the date of the version of the resource to which I am
referring. Wikimedia resources are also identified by their versioned,
"stable" or "permanent," URL. It is possible that I will reference
different versions of the same Wiki page.
All of this may sound confusing, and it was no easy task coming to this
understanding, but in the end I hope it is useful. If the intention of
bibliography is to permit the reader to follow the author's journey through
the sources, the ready accessibility of online resources is a boon to
all.
this entry posted to
method;
comments (2)
2006 Sep 04 | Outsider Contributions
When I make a substantive contribution to Wikipedia, I tend to edit
"off-line" until I'm satisfied with the text, and then post it in a single
chunk. While I am only a WikiGnome in any case, the typical Wikipedia metric
of "edit counts" would underestimate the contribution made by people who edit
in a similar fashion. My own simple
Python script exhibits this problem. To get some sense of the substance
of any given edit, one would have to go beyond screen-scraping and perform
analysis on the Wikipedia database -- something beyond my desktop computer.
Fortunately, Aaron Swartz purchased "some time on a computer cluster" and
came up with the following novel result:
When you put it all together, the story become clear: an outsider makes
one edit to add a chunk of information, then insiders make several edits
tweaking and reformatting it. In addition, insiders rack up thousands of
edits doing things like changing the name of a category across the entire
site -- the kind of thing only insiders deeply care about. As a result,
insiders account for the vast majority of the edits. But it's the outsiders
who provide nearly all of the content.
I'm looking forward to seeing these findings replicated.
this entry posted to
method;
comments (0)
2006 Jun 12 | Nupedia-l Archives
I recently completed my review and analysis of the Nupedia e-mail list
archives. Since they are no longer easily accessible, I thought I would share
the raw archives: nupedia-l.tar.bz2.
This HTML version of the e-mail archives was extracted from the Internet Archive via the following
command:
wget --exclude-domains gizmology.net -e robots=off -nH
--cut-dirs=3
--base=http://web.archive.org/web/20030822044803/http://www.nupedia.com/pipermail/nupedia-l/
-r -l 4 -N -k -p -R js -Gbase
http://web.archive.org/web/20030822044803/http://www.nupedia.com/pipermail/nupedia-l/
I believe this archive contains additional textual processing subsequent
to the `wget` to make it more
useful to me.
If you wish to access the messages from this archive, turn off your
JavaScript. Otherwise, you will be taken to the online version when you click
on a link, which can be slow. However, accessing the online Web version can
be useful if I failed to gather a copy from "20030822044803" date space of
the archive and you want to try other periods. (One can also find an
mbox-like file of the messages though it would require a lot of work to make
it compliant to the mbox format.) This is a tar archive compressed with bzip.
If you are inclined to cite this collection you can note it is part of the
"Reagle Wikipedia Archive."
this entry posted to
method;
comments (0)
2006 Jun 09 | The method of haiku
A Zen-inspired aesthetic of haiku is sabi: an insightful
appreciation of the "suchness" of ordinary objects and daily events. Hass
(1994:xiv) writes of
this as a "quality of actuality, of the moment seized on and rendered
purely." This pureness of vision led Barthes (1983:60) to claim that haiku's
"brevity would guarantee their perfection," their "simplicity would attest to
their profundity."
I am foolish enough to aspire towards this quality in my own work. Of
course, in my dissertation
proposal I cloak my poetic inspiration with sympathetic methodological
scholarship:
Yet, there is a goal that I aspire to, my research "should be empirical
enough to be credible and analytical enough to be interesting" (van
Maanen1988:29). I hope to make a convincing contribution (Golden-Biddle and
Locke 1993) by providing an account that has authenticity, "the ability of
the text to convey the vitality of everyday life encountered by the
researcher in the field setting" (p. 599), plausibility, "the ability of
the text to connect two worlds [of the writer and reader] that are put in
play in the reading of the written account" (p. 600), and criticality, "the
ability of the text to actively probe readers to reconsider there
taken-for-granted ideas and beliefs" (p. 600).
I recognize this aspiration is foolish because it is not the norm, as I
understand academia. I have long characterized my own stance as a "reflective
practitioner," a seemingly rare and unsupported breed. I do not claim a
perfectly impartial objective and outsider perspective; I reach for
analytical, reflective, distance while appreciating that those most familiar
with a phenomenon also understand its faults the best, however much they are
attached to it. This posture opens me up to criticisms of losing
impartiality, for having "gone native." (But, of course, I was already
partially native and "critical"
should not always mean pejorative.) Or, some will ask "what is the
contribution to theory?" This question is important but incomplete to my
mind, its companion should be: "and what is the contribution to practice?"
For what is the point of a field that follows the world so as to only argue
about how we should argue about it? In his study of Quaker decision-making
Sheeran (1996:xiv) wrote in his preface :
Social scientists and political philosophers are invited to discover in
Quakers what may be the only modern Western community in which
decision-making achieved the group-centered decisions of traditional
societies. In the Conclusion, the author discusses Friends as a possible
answer to the common contemporary wish for enhancement beyond the
fragmented individuation of "liberal" man.
Finally, the author hopes Quakers themselves will find in these pages a
helpful mirroring of Friends decision-making. Newcomers to Quakerism and
those who find themselves in roles of leadership within the community may
find in this study an outsider's understanding of the possibilities and
pitfalls of the Quaker method of going beyond majority rule.
This strikes me as an worthwhile balance, one I hope to achieve is
well.
this entry posted to
method;
comments (0)