Wikipedia as an Open Content Community

Joseph Reagle <joseph.2003@nyu.edu>

What patterns of communicative behavior might one find in the collaborative development of Wikipedia, a free on-line encyclopedia? How do human action, culture, and technological capability recursively interact in order to yield this form of open content? Specifically, how does seemingly anarchist participation yield a surprisingly coherent resource? This proposal situates these questions within a communications and organizational literature, and poses specific questions and a methodology for understanding the Wikipedia contributors, contributions, and domains of production.

What’s so special?

Free and Open Source (F/OS) software has caught the attention of market pundits and academics alike. Seemingly, there is something novel about these technical communities that permit them to produce high quality software without the usual incentives and organizational structures (Moon and Sproul 2002). A possible cause for this difference might be that these are "technical" communities: that which is being produced is of an immediate utility to its author and permits her to learn and explore. These motivations are captured by the well known aphorism of Eric Raymond (1997) that, "Every good work of software starts by scratching a developer's personal itch." Researchers (Tzouris 2002, Hertel, Niedner and Herrmann 2003, Karim and Wolf 2003, and von Hippel and von Krogh 2003) have confirmed that satisfying a local need in a challenging context is a significant motivation for those who develop software.

Yet, while prominent, the software community is not the only one producing and sharing content; this model of openness has extended to other forms of cultural production. The Wikipedia is a collaborative encyclopedia containing an immense amount of information from thousands of contributors; the Creative Commons provides licenses and community for the sharing of texts, photos, and music. However, the assertion that software production set the example for non-technical content is contestable; it is likely a more complex relationship. The Internet and the Web, originally developed for the communication of prose, enabled the F/OS movement. And this in turn led to the widespread deployment of collaborative tools and norms that are now used to cooperatively produce prose.

This paper does not propose to identify the differences between open technical and non-technical content communities relative to social characteristics such as motivation, structure, joining, goals, and identity. (Wagner (2004:283) identifies many of the similarities.) I simply wish to note that the open content production phenomenon is not limited to software and at the same time not all communities within this phenomenon are analytically identical. For example, the Wikipedia is an amazing collection of information created by non-technical users – a relative term, but by which I mean they do not have to be software developers. Their contributions probably do not solve a local problem (“an itch”) to the same degree that a hacker’s contribution does, and the contributor might accrue less reputation given the high degree interaction on a given topic, some of which is even anonymous.  

Instead, this proposal attempts to introduce and place Wikipedia within a literature where such phenomena are the results of an interplay between human agency and larger structures, specifically genre systems (Yates and Orlikowski 2002); this notion is used to understand communciative structures as they are enacted by members of the community.

 The Wikipedia

The Wikipedia is a “Wiki” based encyclopedia. "Wiki wiki" means "super fast" in the Hawaiian language, and Ward Cunningham chose the name for his project in 1995 to indicate the ease with which one could edit Web pages. In a sense, a Wiki captures original features of the World Wide Web as conceived by its creator Tim Berners-Lee. The Web was conceived of as a browsing and editing medium. However, when the Web began its precipitous growth the most popular clients lacked the ability for users to edit a Web page. (Berners-Lee’s original implementation had this ability, as does the Amaya Web client maintained by the W3C.) Consequently, the majority of Web users knew the medium only as browsers: as consumers of content.

The Wiki changed this asymmetry by placing the editing functionality on the server. Consequently, if you can read the page you can also edit it. With a Wiki, the user enters a simplified markup into a form on a Web page. To add a numbered list item with a link to the Wikipedia one simply types: “# this provides a link to [[Wikipedia]]”. The server-side Wikipedia software translates this into the appropriate HTML and hypertext links. To create a new page, one simply creates a link to it! Furthermore, each page includes links through which one can sign in (if desired), view a log of recent changes to the page (including the author, change, and time), or participate in a discussion about how the page is being edited – and this too is a Wiki page. These powerful features are representative of Cunningham’s (2004) original design principles for Wiki: that it be open, incremental, organic, mundane (simple), universal, overt (there’s a correspondence between the edited and presented form), unified, precise, tolerant, observable, and convergent (non-redundant content). The universal application of a general tool facilitates a surprisingly sophisticated creation!

Yet, as is often the case, the consequence of this quick and informal approach was not foreseen – or, rather, was pleasantly surprising. Wikipedia is the populist offshoot of the Nupedia project started in March of 2000 by Jimbo Wales and Larry Sanger. Nupedia’s mission was to create a free Encyclopedia via rigorous expert review under a free documentation license. Unfortunately, this process moved rather slowly and having recently been introduced to Wiki, Sanger persuaded Wales to set up a scratchpad for potential Nupedia content where anyone could contribute.  However, “There was considerable resistance on the part of Nupedia's editors and reviewers, however, to making Nupedia closely associated with a website in the wiki format. Therefore, the new project was given the name ‘Wikipedia’ and launched on its own address, Wikipedia.com, on January 15 [2001]” (Wikipedia 2004o).

Wikipedia proved to be so successful that when the server hosting Nupedia crashed in September of 2003 (with little more 23 “complete” articles and 68 more in progress) it was never restored. As of today, there are over 50 different language Wikipedias (2004i); the original English version exceeds 250,000 articles, including most of the Nupedia content. The Wikimedia Foundation, incorporated in 2003, is now the steward of Wikipedia as well as a new Wiki based dictionary, compendium of quotations, collaborative textbooks, and repository of free source texts.

Communities

One conceptualization of the Wikipedia phenonemum is that of a community of content producers. Research on similar communities has focused on the electronic or virtual character of the media or product (Sproull 2003), the voluntary character of participation (Sproull 2003; Sproull, Conley, and Moon 2004), the openness of the community (Aigrain 2003; Reagle 2004); and the process and type of content they produce (Garcia and Steinmueller 2003, Cedergren 2003; Stadler and Hirsh 2002).

Sproull (2003:733) defines an on-line community as “a large, voluntary collectivity whose primary goal is member or social welfare, whose members share a common interest, experience, or conviction, and to interact with one another primarily over the Net.” She explicitly excludes from her definition electronic work groups and virtual teams that consist of a relatively small number of paid members with economic goals, and ad hoc groups or buddy lists who primarily interact in the real world. Furthermore, Sproull’s definition recognizes the importance of asynchronous communication to make “microcontributions” – furthering low barriers to entry – that can be aggregated by software into substantive products, as we see in Wikipedia articles.

Sproull, Conley, and Moon (2004) consider a kind of behavior of the “Net” community that is pro-social in that it is intentional, voluntary, and benefits others. From the literature they note that pro-social behavior can be distinguished as altruistic (motivation is purely to increase another's welfare) or egoistic (motivated by the desire to increase one's own welfare through helping others), and rely upon social identity and self-categorization theory to offer an explanation of the difference. It is not yet clear which form of altruism is dominant in Wikipedia production.

Reagle (2004) defines an open content community as one that delivers content under an open copyright license, and demonstrates transparency, integrity, non-discrimination, and non-interference. The last characteristic being:

… the linchpin of openness, if a constituency disagrees with the implementation of the previous three criteria, the first criteria permits them to take the products and commence work on them under their own conceptualization without interference. While “forking” is often complained about in open communities – it can create some redundancy/inefficiency – it is an essential characteristic and major benefit of open communities as well.

Interestingly, Wikipedia might be considered a "friendly" fork from the original Nupedia project. And while not hostile, two subsequent forks from Wikipedia are also demonstrative of openness under this definition. (The Enciclopedia Libre Universal en Español forked from Wikipedia over a misunderstanding about the possibility of advertising on the site; Wikinfo forked over an epistemological difference with Wikipedia's policy of striving for a "neutral point of view.")

Finally, Aigrain (2003) speaks of open information communities and attributes their success to (1) low transaction costs of interaction which encourage contribution, and (2) copyright licensees that “enable new forms of relationships between the individual and the collective.”

I believe these conceptualizations are approriate for understanding the Wikipedia as many of their characteristics correspond to Cunningham's (2004) Wiki design principles, as described in the previous section.

Communities of Practice

The papers cited in the previous section speak simply of content or information, but there is a significant body of related literature that is concerned with "knowledge." In Wiki: A Technology for Conversational Knowledge Management and Group Collaboration, Wagner (2000:2) defines conversational knowledge creation as when, “individuals create and share knowledge through dialog with questions and answers.” While this usage is nearly synonymous with the more generic term “information,” Alavi and Leidner (2002:111) review many of the substantive perspectives on knowledge (e.g., data and information, state of mind, object, process, access to information, and capability) and the implications such a perspective has on research. Because of the Wikipedia’s character, I adopt a particular conceptualization of knowledge, that of a community of “practice” and “knowing.”

While the seminal papers in this discipline do not specifically concern themselves with the open communities discussed in this proposal – instead, focusing on firms – the usage of the term is homologous. (Interestingly, Boland Jr. and Tenkasi (1995) speak of communities of knowing as open systems based on the “open systems” of scientific communities from Star’s (1993) Cooperation Without Consensus in Scientific Problem Solving: Dynamics of Closure in Open Systems.)

 In Organizational Learning And Communities-Of-Practice, Brown and Duguid (1991) noted that there is often a “variance between a major organization's formal descriptions of work both in its training programs and manuals and the actual work practices performed by its members.” In open content communities, formal titles and job descriptions are rare; practice tends to lead and description must catch-up. (For example, keeping one's documentation up to date with what one’s software is actually doing is always a challenge.) In the case of the Wikipedia, practice and description are inexorably linked: the community documents its own behavior and policy.

Within this literature actual situated behavior becomes paramount. Orlikowski (2000:407) defines technologies-in-practice as “enacted structures of technology use” that are virtual and emerge from “people's repeated and situated interaction with particular technologies.”  And just as she refuses to reify interaction with technology, she resists the conception of knowledge as an object. Orlikowski (2002:249) relies upon Brown’s and Duguid’s distinction between “know-how” and “know-what,” Schön’s “knowing in action,” and Giddens’ theory of structuration to argue for a “knowing in practice,” a perspective that “suggests that knowing is not a static embedded capability or stable disposition of actors, but rather an ongoing social accomplishment, constituted and reconstituted as actors engage the world in practice.”

Levina (2000:12) notes that Schön’s “reflection-in-action” is a "conversation with the material of a situation”; she then extends this to:

understand professional practice in collaborative environments that combine diverse expertise. I introduced the term collective reflection-in-action to describe a ‘conversation’ with different audiences, which brings about dilemmas stemming from differences in appreciative systems of participants involved in different professional and organizational practice.

If anything, the Wikipedia community is reflexive; in addition to talk pages for each encyclopedia article, it includes both a “Wikipedia” namespace within the Wikipedi itself (polished pages “about Wikipedia”), and a “meta” Wiki, (“a wiki about Wikimedia”).

Genres

Researchers have offered a number of approaches to understanding patterns of behavior in an organized context: March and Simon’s (1958) routines, Barley’s (1986) scripts, Pentland’s (1992) moves, Crampton’s (2001) episodes/behaviors, and von Krogh, Spaeth and Lakhani’s (2003) “joining” scripts.  An additional approach to understanding “knowing” within a community is to apply a practice lens (Orlikowski 2000) to a repertoire of genres (Orlikowski and Yates 1994).  Or, more informally, to understand technology/media in a situated context of communicative patterns.

Yates and Orlikowski (2002:15) specify that “a genre established within a particular community serves as an institutionalized template for social interaction – and organizing structure – that influences the ongoing communicative action of members through their use of it within and across their community.” A genre permits one to address the questions (2002:15-17) of:

Yet, genres are not an analytical tool for the researcher, nor “an individual's private motive for communicating, but a purpose socially constructed and recognized by the relevant organizational community for typical situations (e.g., proposing a project, meeting to review project status)" (Orlikowski and Yates 2002:15). Genres are patterns that “shape but do not determine how community members engaged in everyday social interactions.”

What genres and repertoires might one find within Wikipedia discourse?

Methodology

Yates and Orlikowski (2002:15) insist that genres are socially recognized by the community; but that genres can be implicit. Seemingly, this re-invites many of the epistemological questions inherent in conceptualizations of knowledge but is further complicated by questions of methodology (Burrell and Morgan 1979). Typically, this work is done via ethnography (observation) and induction (discernment). And while it is not yet clear to me how this question of implicit patterns of community behavior versus the inductive categorization of the researcher can be resolved, reminding me of Deetz's (1996) double-hermeneutic of interpreting an interpretated word, I'm interested in the approach nonetheless.

This proposal includes three approaches to research. The first is a genre systems analysis using ethnographic/inductive methods of Wikipedia discursive practice in order to identify the why, what, who, how, when, and where of Wikipedia behavior. (See Stoller (1989), and the approaches of Levina (2002), particularly Agar (1980), Bourdieu (1977), Bourdieu and Wacquant (1992), Glaser and Strauss (1967), Klein and Myers (1999), Schultze (1997, 2000) , and van Maanen (1979)).

 The second and third approach complement and test the findings of the first via a quantitative and network analysis of the Wikipedia structure and participants interviews.

Research Questions

The following questions attempt to discern patterns of mediated interaction within the Wikipedia.

Contributions

CN1: What kind of activity is Wikipedia content production?

As already discussed, many researchers are studying collaborative software production. Additionally, Pinsoneault et al. (1999) considers electronic brainstorming; Briggs et al. (1997) considers collaborative writing; Todd and Benbasat(1999) consider decision making; Levina (2002) considers collaborative system design; and Sambamurthy and Zmud (1999) consider governance. To what extent is Wikipedia activity a subset, or even superset, of these activities?

CN2: How can contributions be categorized? Can one identify edits to the Wikipedia as a genre repertoire including genres of initial page creation, outline, substantive contribution, vandalism, or correction?

Yates and Orlikowski (2000) identify three genre systems that structure interaction within the collaborative team room. Briggs (1997) notes that there different forms of collaborative editing: sequential, parallel, and reciprocal. Levina (2002) found three types of “collective-reflection-in-action”: ignoring, adding, and challenging. I expect similar genres could be found within the Wikipedia, with those related to acts of vandalism particularly interesting.

CN4: What is the lifecycle (e.g., start, active, stable) of a Wikipedia article?

Domains

D1: To what extent are there different domains of content production? Are the Wikipedia, the Wikipedia name space, the meta Wiki, and e-mail lists understood as different content domains with different norms of collaboration?

D2: Are there different topical areas with respect to the production of Wikipedia articles?How do the participants, as reflected in the Wikipedia discourse, demarcate the boundaries and structure of their environment?

I could complement an inductive analysis with a network content analysis of citations/links and common contributors that might reveal “clumps" of connected topical domains and contributor groups.

Contributors

CR1: What roles can be identified? Can one identify leaders, substantive contributors, tweaker (i.e., small corrections), and lurkers and with respect to an article, topical domain, or the Wikipedia itself? Are the binding between roles and editing genres stable, or fluid? How do the members of the community conceptualize and speak of the community?

Yoo and Alavi (2003) identified some of the characteristics of emergent leadership in an e-mail context such as longer messages and a higher task-focus. Jarvenpaa and Leidner (1999) have detected that roles do emerge in successful “high-trust” teams (806). Do these findings apply in the context of the Wikipedia?

CR2: What role does the anonymous contributor play?

Briggs et al. (1997) has noted that anonymity does not necessarily lead to more flaming, and can be a way in which substantive and productive criticism can the made without social stigma. Yet, Pinsoneault et al. (1999) noted that in electronic brainstorming, anonymity made little difference with respect to the number of ideas generated within the group.

Wikipedia discourse could be analyzed with respect to those that are blocked from participation or likely to prompt significant discussion to determine if contentious issues or actions arise from anonymous contributors. (My limited experience indicates that identified contributors are a significant source of contention.)

CR3: Does the role a contributor plays evolve during their participation?

CR4: What are the demographic and human capital characteristics (Ang, Slaughter, Ng 2000) of Wikipedia contributors, particularly with respect to whether their contributions relate to their existing professional or educational background or are more likely to related to their hobby?

CR5: How do people view their interactions with respect to satisfaction (individual or collective) and productivity?

CR4/5 would be investigated using interview/survey instruments of Wikipedia contributors.

References

Agar, M. (1980). The professional stranger: an informal introduction to ethnography. Academic Press, New York, NY.

Aigrain, P. (2004). The individual and the collective in open information communities. 16th BLED Electronic Commerce Conference.
[ http://opensource.mit.edu/papers/aigrain3.pdf ]

Alavi, M. and Leidner, D. (2002). Review: knowledge management and knowledge management systems: conceptual foundations and research issues. MISQ, 25:107-136.

Ang, S., Slaughter, S., and Ng, K. Y. (2002). Human capital and institutional determinants of information technology compensation: modeling multilevel and cross-level interactions. Management Science, 48(11):1427.

Barley, S. (1986). Technology as an occasion for structuring - evidence from observations of CT scanners and the social-order of radiology departments. Administrative Science Quarterly, 31(1):78-108.

Bourdieu, P. (1977). Outline of a theory of practice. Cambridge University Press, New York, NY.

Bourdieu, P. and Wacquant, L. (1992). An invitation to reflexive sociology. University of Chicago Press, Chicago.

Briggs, J., Nunamaker, R., Mittleman, D., Vogel, D., and Balthazard, P. (1997). Lessons from a dozen years of group support systems research: a discussion of lab and field findings. Journal of MIS, 13:163-207.

Brown, J. and Duguid, P. (1991). Organizational learning and communities-of-practice: toward a unified view of working, learning, and innovation. Organization Science, 2(1):40-57.
[ http://www.slofi.com/Organizational_Learning.htm ]

Burrell, G. and Morgan, G. (1979). Sociological paradigms and organizational analysis. Heinemann, London.

Cedergren, M. (2003). Open content and value creation. First Monday, 8,(8,).
[ http://www.firstmonday.dk/issues/issue8_8/cedergren/ ]

Cramton, C. (2001). The mutual knowledge problem and its consequences for dispersed collaboration. Organization Science, 12:346-371.
[ http://www.kmentor.com/socio-tech-info/archives/000017.html ]

Cunningham, W. (2004). Wiki Design Principles.
[ http://c2.com/cgi/wiki?WikiDesignPrinciples ]

Deetz, S. (1996). Describing differences in approaches to organization science: rethinking burrell and morgan and their legacy. Organization Science, 7(2).

Garcia, J. M. and Steinmueller, W. E. (2003). Applying the open source development model to knowledge work. SPRU - Science and Technology Policy Research, INK Open Source Working Paper, (2).
[ http://siepr.stanford.edu/programs/OpenSoftware_David/oswp2.pdf ]

Glaser, B. G. and Strauss, A. L. (1967). The discovery of grounded theory; strategies for qualitative research. x, 271, Chicago, IL.

Hertel, G., Niedner, S., and Herrmann, S. (2003). Motivation of software developers in open source projects: an internet-based survey of contributors to the Linux kernel.
[ http://opensource.mit.edu/papers/rp-hertelniednerherrmann.pdf ]

Jarvenpaa, S. and Leidner, D. (1999). Communication and trust in global virtual teams. Organization Science, 10(6):791.
[ http://hyperion.math.upatras.gr/commorg/jarvenpaa/ ]

Jr., R. B. and Tenkasi, R. (1995). Perspective making and perspective taking in communities of knowing. Organization Science, 6(4):350-372.

Karim, L. and Wolf, B. (2003). Why hackers do what they do: understanding motivation and effort in free/open source software projects.
[ http://opensource.mit.edu/papers/lakhaniwolf.pdf ]

Klein, H. and Myers, M. D. (1999). A set of principles for conducting and evaluating interpretive field studies in information systems. MIS Quarterly, 23(1):67-92.

Levina, N. (2002). Collaborative practices in information systems development: a collective reflection-in-action framework. In 23rd International Conference on Information Systems. Barcelona, Spain.

March, J. G. and Simon, H. A. (1958). Organizations. John Wiley and Sons, New York, NY.

Moon, J. Y. and Sproul, L. (2002). Essence of distributed work: the case of the Linux kernel. In Pamela Hinds, S. K., editor, Distributed Work, chapter 16. MIT Press.
[ http://www.firstmonday.dk/issues/issue5_11/moon/ ]

Orlikowski, W. (2002). Knowing in practice: enacting a collective capability in distributed organizing. Organization Science, 13:249.
[ http://64.233.161.104/search?q=cache:SHvZunJ0kxoJ:opensource.mit.edu/papers/orlikowski.pdf+Knowing+In+Practice:+Enacting+A+Collective+Capability+In+Distributed+Organizing&hl=en&ie=UTF-8 ]

Orlikowski, W. J. (2000). Using technology and constituting structures: a practice lens for studying technology in organizations. Organization Science, 11(4):404-428.

Pentland, B. (1992). Organizing moves in software support hot lines. 37(4), 527-548., Administrative Science Quarterly.

Pinsoneault, A., Barki, H., Gallupe, R., and Hoppen, N. (1999). Electronic brainstorming: the illusion of productivity. Information Systems Research, 10:110-133.

Raymond, E. (1997). The cathedral and the bazaar.
[ http://www.catb.org/~esr/writings/cathedral-bazaar ]

Reagle, J. (2004). Open content communities. forthcoming M/C Journal: The Open Issue, 7(3).
[ http://reagle.org/joseph/2003/12/open-media-culture.html ]

Sambamurthy, V. and Zmud, R. (1999). Arrangements for information technology governance: a theory of multiple contingencies. MIS Quarterly, 23(2):261.

Schultze, U. (1997). Information as practice: an ethnography of knoweldge work. Case Western Reserve University, Cleveland, OH.

Schultze, U. (2000). A confessional account of an ethnography about knowledge work. MIS Quarterly, 24(1):3-6.

Sproull, L. (2003). Online communities. In The Internet Encyclopedia, pages 733-744. John Wiley, New York.

Sproull, L., Conley, C. A., and Moon, J. Y. (2004). Prosocial behavior on the Net. Forthcoming.

Stadler, F. and Hirsh, J. (2002). Open source intelligence. First-Monday, 7(6).
[ http://www.firstmonday.dk/issues/issue7_6/stalder/ ]

Star, S. L. (1993). Cooperation without consensus in scientific problem solving: dynamics of closure in open systems. In Easterbrook, S., editor, CSCW: Cooperation or Conflict. Springer Verlag, London, UK.

Stoller, P. (1989). The taste of ethnographic things. University of Pennsylvania Press, Philadelphia, PA.

Todd, P. and Benbasat, I. (1999). Evaluating the impact of DSS, cognitive effort, and incentives on strategy selection. Information Systems Research, 10(4):356-374.

Tzouris, M. (2002). Software freedom, open software and the participant's motivation - a multidisciplinary study. In M.Sc. Thesis. London School of Economics and Political Science.
[ http://opensource.mit.edu/papers/tzouris.pdf ]

van Maanen, J. (1979). The fact of fiction in organizational ethnography. Administrative Science Quarterly, (24):539-550.

von Hippel, E. and von Krogh, G. (2003). Open source software, the private-collective innovation model. Issues For Organization Science, 14(2):209.

Wikipedia (2004a). History of Wikipedia.
[ http://en.wikipedia.org/wiki/History_of_Wikipedia ]

Wikipedia (2004b). Main page.
[ http://en.wikipedia.org/wiki/Main_Page ]

Yoo, Y. and Alavi, M. (2003). Leadership in virtual teams: what do emergent leaders do? unpublished.