Revenge rating and tweak critique at

Joseph Reagle


Abstract:, begun in 1993, permits users to submit photographs for viewing and critique. Hence, long before Flickr and Tumblr and +1s and likes, “netters” grappled with how to best share and evaluate one another’s aesthetic works. For instance, are numbers an appropriate form of evaluation (if so, what range should be used?), anonymity (are blinded reviews better?), manipulation (how to prevent people from “mate-rating” friends and “revenge-rating” enemies?), genres (why are nudes so much more popular?), and critique etiquette (is it okay to tweak another’s image in Photoshop?). I relate these issues to a few, more recent, examples of online evaluation beyond photography. Most importantly, I offer six characteristics of evaluation learned from that continue to be broadly relevant in the age of digital evaluation including that it’s hard to quantify the qualitative and when people do so quantitative mechanisms often beget their own manipulation.

Published as: Reagle, Joseph. “Revenge Rating and Tweak Critique at” In Online Evaluation of Creativity and the Arts, ed. H. Cecilia Suhr. Routledge, NY, 2015.


In today’s world, most anything can be posted online, rated, liked, and ranked. flickr (founded in 2004), tumblr (2007), and Instagram (2010) exemplify this Web 2.0 penchant for the sharing and evaluation of images. Yet, these sites were not the first to support these practices among photographers. For instance, has been operating since December 2001 and has long supported photographic critiques among its contributors (Xu & Bailey, 2012).

In this chapter, I focus on the evaluation of photographic works at, another early site begun in 1993. It started as a few discussion boards on a personal web site and now describes itself as “a site for serious photographers to connect with other photographers, explore photo galleries, discuss photography, share and critique photos, and learn about photography” (, 2012b). Indeed, the discussions at about the practices, meanings, and abuses of digital evaluation anticipate discussions about all manner of contemporary digital evaluation. Hence, I claim that is a seminal site of practice and discourse about digital evaluation. In the following pages I describe’s origins and how the community grappled with the issues of numeric ratings (what range should be used?), anonymity (are blinded reviews better?), manipulation (how to prevent people from “mate-rating” friends and “revenge-rating” enemies?), genres (why are nudes so much more popular?), and critique etiquette (is it okay to tweak another’s image in Photoshop?). I relate these issues to a few, more recent, examples of online evaluation beyond photography. Most importantly, I offer six characteristics of evaluation learned from that continue to be broadly relevant in the age of digital evaluation including that it’s hard to quantify the qualitative and when people do so quantitative mechanisms often beget their own manipulation.


Philip Greenspun and

Unlike the founding of tumblr in 2007, it is better to describe as starting in 1993. This was before venture capitalists spoke of “social media” and platform “launches.” At the Web’s start, people were building and experimenting with this new medium, and sometimes personal projects took on a wider significance. Such was the case for Philip Greenspun. In the early 90s Greenspun was a student in MIT’s storied “course 6” (electrical engineering and computer science). Greenspun completed his masters in 1993 and planned to continue on with a doctorate. Yet, he was not a quiet computer-obsessed nerd toiling away alone in a back room. Greenspun was brash and reflective and had interests outside of computers. He traces his interest in photography back to the use of his mother’s camera as a ten year old. Later, “as an MIT undergraduate, surrounded by remarkably unphotogenic classmates, I didn’t haul out my Minolta SRT-102 too often, but I did take a photo class in which we learned to use the 4x5 view camera and darkroom” (Greenspun, 2012). Even on the computer front, he saw himself as outside the mainstream and grew increasingly frustrated that others at MIT did not share his enthusiasm for Internet-based applications (Greenspun, 2000b). He was also fond of travel and Samoyeds, a large, white and fluffy, breed of dog; their visages frequented his publications and his own profile picture for many years. In 1993, having completed one stage of graduate work, chaffing under others’ indifference to the Web, and mourning the recent loss of George, his 65 pound Samoyed, he decided to “take the trip we were going to take together, Boston to Alaska and back, rather than wait until I finished my Ph.D.” (Greenspun, 1993).

Greenspun’s online writing precedes the term “blog.” In fact, ten years before launching his own blog in 2003, he had famously established the Web as a compelling non-technical medium with his personal site It was here that he posted the journal of his trip Travels with Samantha. (Samantha was the Macintosh PowerBook 170 that accompanied him on his journeys.) This travel diary changed how people looked at the still nascent Web. Tim Berners-Lee invented the Web so CERN particle-physicists could better collaborate, but Greenspun showed it could also be a medium for prose and photography. As Greenspun writes, Travels’ 210 pages of text and 250 photographs won the “Best of the Web ’94 and became one of the 10 most heavily trafficked sites on the Internet. More than a thousand people each day read at least part of the book” (Greenspun, 1996). It was “born of the world’s indifference to the World Wide Web and then paradoxically grew to prominence with the Web” (Greenspun, 2000b).

Hence, Greenspun’s work was characterized by the intersection of his technical work, writing, photography, and the community arising out of all of this. At MIT he started the Scalable Systems for Online Communities group which then was spun off into ArsDigita, a profitable venture which developed online community applications. On his personal site,, he hosted hundreds of varied discussion boards (LUSENET, 2012). He continued to publish online, both personal reflections, photographs, and more technical works such as “Database Backed Web Sites” (Greenspun, 1996) and Philip and Alex’s Guide to Web Publishing (Greenspun, 1998). (Alex was his Samoyed dog at the time and appears on the cover with Greenspun).

The approaching millennium brought big changes. At ArsDigita, Greenspun and the other founders were ousted by their venture investors (Greenspun, 2000a). While the domain name had been registered and served pages related to a MIT research group since 1993, Greenspun began adding photographic content in 1997 (Greenspun, 1997). Furthermore, the loss of ArsDigita and the bursting of the dotcom bubble coincided with the migration of photography fora on to, the last of which was moved in 2002 (Administrators, 2002; Luong, 2002). At this point, in June of 2002, Brian Mottershead, who appears frequently in the following discussions, was hired as publisher and editor of to help make it an independent and viable commercial enterprise.

Greenspun became more active again in 2006, but perhaps only to prepare it for sale. In 2007 the site was sold to NamedMedia. As one photographer noted, what began as Greenspun’s personal home page “developed into one of the largest photographer communities online,” and one largely free of advertising. Instead, subscribers paid for membership, meaning they received extra perks, including more storage space for their photos. The “downside of this was a feel of stagnant technology, but the upside was a noise free environment from advertising” (Goldstein, 2007). Today, the site is still active; it describes itself as “an online community with hundreds of thousands of active members and many more casual viewers visiting daily. We started in 1993 and strive to be the best peer-to-peer educational system for people who wish to become better photographers” (, 2012a). While the site’s administrators make efforts to keep current, such as the inclusion of social media sharing buttons (J. Root, 2011), it does have a dated feel and can be slow to respond. Other than the pokiness, some members might consider its old-fashionedness a virtue that distinguishes it as a venue for “serious photographers” exchanging critique and learning. Its users tend to be a bit more traditional than the “filter-happy” mobile phone photographers who make use of sites like Instagram. ( does, however, have an “iphone” mobile photography category with many beautiful images.)

Greenspun himself continues to blog, take photographs, and write equipment reviews on (However, his last posting to the site’s discussion fora was in 2009 (, 2012c).) His entrepreneurial efforts, while drama-filled, earned him moderate wealth. In addition to reviewing expensive cameras, he now flies and reviews small aircraft. While the photograph on his home page often featured a picture of himself with his dog, it now includes a picture of a smiling father and young daughter.


In the 1990s I too was a student at (and later employee of) MIT, an advocate of the Web, and interested in photography; I’ve been a reader of (on and off) since that time. However, the discussions below are collected from a systematic review of the “Photo Critique and Rating” forum from 2000 to 2005 and I include discussion of changes at the site up to 2010. My findings are based upon a naturalistic inquiry (Lincoln & Guba, 1985; Thomas & Jones, 2006) into digital evaluation practices at this site. I reviewed approximately 500 discussion topics within this period, paying attention to those related to evaluation and that generated significant discussion. My analysis consisted of iteratively coding (and recoding) the content of these sources into various categories, a type of “theoretical sampling” or “emergent design” (Glaser & Strauss, 1967, p. 72; Lincoln & Guba, 1985, p. 209). Drafts of this analysis were shared with members of the community for corrections and feedback.

Analysis: Evaluating digital photography

For more than two decades Terry Barrett’s (1996) Criticizing Photographs has provided an accessible and comprehensive “introduction to understanding images.” In the five editions since 1990, Berrett’s sensible exploration of the description, interpretation, and evaluation of photographs has enabled many students to “better appreciate photographs by using critical processes.” These critical processes include describing photographs, including their subject matter, form, medium, and style. Also, interpretation, “whenever attention and discussion move beyond offering information to matters of meaning,” can be from varied perspectives (e.g., comparative or formal) (p. 1, 38). One can further consider the type of photograph (e.g., descriptive or interpretive), the context (internal, original, external) and the criteria one uses in evaluating photographs. Nowhere does Barrett discuss ratings, rankings, or modifying others’ images as a type of critique. Yet, these are characteristic practices of digital evaluation.

Of course, the premise of sharing and critique is that one learns to be a better photographer, that members are “working to help each other improve.” People exchange tips, give each other critique and advice, and the very act of sharing is spoken of in terms of learning. For instance, one talented and popular photographer’s portfolio includes multiple expressions of gratitude, such that “I learned so much by perusing through your photos” followed by “thanks for sharing” (Pinto, 2005). In addition to these subjective statements, one might also look to quantifiable assessments of performance. In an analysis of over six years of data at the related site photoSIG, Anbang Xu and Brian Bailey (2012) found that participation at photoSIG did, overall, lead to improvements in the average ratings photographers received. However, as we shall see, ratings are not always a transparent metric of quality.


In 1983 sociologist George Ritzer published an article arguing that the McDonalds’ “rationalization” of food production could serve as a useful model for understanding changes in larger society. That is, McDonalds succeeded because of efficiency, technology, calculability, predictability, and control; these same forces are now shaping our life at work, in school, and even when on holiday. Indeed, “calculability,” a drive towards quantifiable measures, is a defining characteristic of contemporary “rational” society. Why? Ritzer notes that quality is “notoriously difficult to evaluate,” yet computers are good at counting.

How do we assess the quality of a hamburger, or physician, or a student? Instead of even trying, in an increasing number of cases, a rational society seeks to develop a series of quantifiable measures that it takes as surrogates for quality. This urge to quantify has given great impetus to the development of the computer and has, in turn, been spurred by the widespread use and increasing sophistication of the computer. (Ritzer, 1983, p. 103)

Hence, we ought not be surprised that, a site begun at MIT, would both innovate upon and suffer the consequences of calculability. In December 2000 took a definitive step in its history, transitioning from a collection of fora for discussing photography into one in which photos could be rated and ranked (O’Neill, 2005).’s system was simple and ad hoc. One could rate a photograph according to its aesthetics or originality on a 10 point scale (stefan ballard in rajeev et al., 2001). While one could also leave a text comment, the innovation of allowing others to quantify the quality of a photograph would be an ongoing source of confusion and controversy. Indeed, by the following August, the site’s administrators posted the first message on the “Photo Critique and Rating” forum asking for “suggestions for what to do with the photo ratings systems” (rajeev et al., 2001).

Some of the responses asked about the motivation of the system: “What is the point of this rating system anyway?” (Joe Oliva in rajeev et al., 2001) Advocates for the system responded that it informed the “top rated” pages, helping users find photographers they might like. Yet, others noted the numbers could be an affront. Of course, while one could always “lighten up and ignore it,” some felt that the numerical scale was too much:

Currently all photos are given a rating of 1-10 in two categories – “cleverness” and “aesthetics.” I’m sure these two categories have meaning to the people at services – but I don’t think they mean the same thing to every visitor who logs in here… I also think 1-10 is too big a spread — a “7” might seem like an average/good score to me, but to someone else (especially a student enrolled in one of our grade inflated universities), a 7 might seem terribly low. (stefan ballard in rajeev et al., 2001)

Interestingly, research on surveys indicate this concern is well founded, as there’s often problems with skew and consistency when there’s fewer than five or more than nine items (Cox, 1980; Preston & Colman, 2000). To avoid these problems, perhaps labels could be used instead? “Let people indicate easily, on all pictures, whether they ‘hate it,’ ‘it’s ok,’ ‘I love it,’ and leave it at that” (Wayne Melia , Jonathan Watmough in rajeev et al., 2001).

In time, a seven-item scale was adopted but it too had its problems. Ratings were now compressed around five and six, leading one contributor to ask if rating with decimal fractions would be useful (Kelly, 2004). Another contributor agreed, “The difference between 5 and 6 is dramatic.” An average of six might land you close to the first few pages of “top rated” photos, whereas with a five “you are at the very end…like number 2000 or so. At the very least a 5.5, 6.5 and even 4.5 would be nice” (Vincent K. Tylor in Kelly, 2004). Others responded this was a “bad idea” because a wider ten-item scale had already been tried and abandoned. Others recommended that people stop worrying about the difference between a five and six: “Come on guys, are you the types who go to little league games and have to be evicted?” (John Falkenstine in Kelly, 2004) Additionally, some thought the labels at either end of the scale were overly pejorative suggesting the words “Very Bad (synonymous with, You Stink) and Excellent (You are the Best)” could be exchanged with “words like Min. and Max.” such that “people might not take low ratings so harshly and others might feel a bit less like massive jerks for giving say a 2/7 [in aesthetics/originality]” (Turner, 2004).

Some worried that the average (mean) of the ratings was not sufficient; it was susceptible to “aberrant ratings” which could be remedied with the reporting of the median and mode of ratings (Counts, 2004). Or, since textual comments were preferred to numerical ratings, perhaps the ranking system should simply use the number of comments? (Joe Boyd rajeev et al., 2001) Also, given that many of the top rated photos only have a few ratings, perhaps they should be omitted “until the photo has received enough ratings to make the average meaningful?” (M, 2004)

Of course, one could always leave textual comments. These were almost always appreciated over numerical ratings. As an example, an image of some apples lit by candlelight received varied, reasonable critiques. One commenter wrote “I think that the subject placement is good and that the lighting works well. The light fall-off in the corners is very effective. On the negative side, it’s a fairly boring image which fails to excite me in any way.” This was followed by a comment that “I must be more easily excited …because I find this quite stimulating to the eye. Great use of available light souces. I too have used candle light on some of my photos and I like the result here.” Yet, its ratings became the subject of a discussion about revenge: when the photographer of the apples rated someone else’s image as a 2/2, “in less than 20 minutes I received a 2/2 on one of my photos from the same photographer.” In the subsequent discussion about revenge rating some members stressed that ratings should not be taken seriously. Another member wrote that while the “photo is well crafted enough to appear in any magazine” it “deserves neither the perfect 7/7 scores it received nor the obviously punitive 1/1 ratings dumped on it” (Lex Jenkins in Dejkam, 2003).

Hence, the expressed skepticism of numeric ratings is understandable. In response to the original 2001 call for help, many responded that numeric ratings should be “ditched” or “scrapped” in favor of comments (Jeremy Burton and Steven Kembel in rajeev et al., 2001). “Comments are a preferred way of giving feedback on peoples’ artwork. There is an inherent silliness in assigning quantitative ratings to artistic expression” (Larry Walker in rajeev et al., 2001). Over the course of years, people also suggested that ratings somehow be linked to comments. For instance, “if someone leaves anything lower than, say 3, make it mandatory to leave comments, not just a numerical rating” (Acer Iddibhai in rajeev et al., 2001). However, mandatory comments would then likely prompt an influx of “photo sux” comments. In fact, this fear has been borne out in more recent evaluation systems. Laura Gibbs wrote about her experience in a massive open online course (MOOC) and the demoralizing comments that arose from mandatory peer feedback. She noted that there were plenty of one-word evaluations (e.g., “ug” and “terrible”) as well as a comment that consisted of the words “one,” “two,” through “thirty” so as to conform to the minimum word-length requirement (Gibbs, 2012).

In September 2010, after ten years of experimentation and contention, administrators enacted a wide-sweeping series of changes to the rating system. Most surprising of all, “we’ve finally given up on the ‘Aesthetic/Originality’ two number ratings system.”

The concept was good in theory, many people take beautiful though unoriginal images, it never worked out in practice. Virtually nobody did anything but rate two of the same number (5/5, 3/3, etc) for every rating…. In addition, having a system that worked “in theory” but not in practice tended to give people the impression that there was more in-depth information to be gleaned out of the ratings system than there was in reality. (J. Root, 2010)

At the community learned that numerical ratings for aesthetic judgments could be a bruising innovation: while a tempting idea, it is difficult to quantify the qualitative (characteristic 1). No matter how often administrators and others told people to not take them to heart, to ignore them, to focus on beauty and learning rather than rank, people are seduced by numbers.


There are ironic consequences to Ritzer’s (1983) McDonaldization. Its features (such as efficiency or calculability) often become ends in and of themselves. This then creates systems in which any means possible are used to achieve those (now) arbitrary ends: “We might say that rational systems are not reasonable systems” (p. 102, 106). This insight has been shared by other scholars, most notably economist Charles Goodhart (1975) who noted that in monetary policy: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” This “collapse” likely arises from what social scientist Donald Campbell (1976) wrote in the following year: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (p. 49). These related insights are now known as eponymous “laws” that are most concisely expressed by anthropologist Marilyn Strathern. In the 1990s she noted that the “proliferation of procedures for evaluating performance” in U.K. higher education was detrimental to learning because “when a measure becomes a target it ceases to be a good measure” (Strathern, 1997, p. 308). One can see evidence of this in’s evolving rating system.

At, what prompted the early call for “suggestions for what to do with the photo ratings systems?” Member Vuk Vuksanovic described the emergence of rating manipulation and revenge-rating: “someone has decided to go through the folders of certain people and systematically deposit scores of 1 on all the pictures. I am one of those targeted and suspect it’s some kind of revenge related to my recent criticism of certain members who’d created false accounts to rate their own work.” As another member noted “The whole ratings business has turned into a giant pissing contest, with numerous people abusing the system to (I can only assume) see their name in lights on the top rated photographers list and top-rated photos list until the next idiot comes along with a dozen mediocre snapshots which they and their friends all give ‘10/10’ scores to” (Vuk Vuksanovic, Steven Kembel in rajeev et al., 2001). This type of behaviour is now seen throughout the Web and is referred to as “karma whoring.” The origins of this phrase likely arose from the geek discussion site Slashdot within the same time period (Meme, 2013). There, users accumulate karma for positive contributions, including posting content and comments and moderating others’ contributions. However, some users attempted to gain karma using tactics that were of little or negative value. This problem is recognized today at the popular site Reddit which asks its users not to “Mass downvote someone else’s posts. If it really is the content you have a problem with (as opposed to the person), by all means vote it down when you come upon it. But don’t go out of your way to seek out an enemy’s posts” (Reddiquette, 2019).

Hence, in addition to the difficulty of quantifying the qualitative, the community learned that quantitative mechanisms beget their manipulation (characteristic 2). In addition to the proposals for “scrapping” ratings or requiring comments, members suggested ways to counter manipulation. An obvious step would be to forbid self-rating (Brian E in rajeev et al., 2001). This could also be complemented by, what I label, minimum thresholds and rate limits. With a minimum threshold raters must post a minimum of their own photos before rating the photos of others (Vuk Vuksanovic in rajeev et al., 2001). Under rate limiting, one limits how often a member may rate, or submit a photo for rating (Joe Boyd in rajeev et al., 2001). However, limitations are not necessarily the best way to create a robust rating system. Indeed, broad participation from the whole community is needed to counteract those intent on abuse. Hence, if broad participation leads to a more honest system, one could promote reciprocity by permitting only those who have rated to be rated. However, requiring raters to post their own images and rate others’ images presumes that everyone on the site (a) takes a lot of photos (some simply enjoy viewing and discussing them) and (b) feel confident enough in their own skill to evaluate others (Rodriguez, 2003).

Other proposals relied upon the distinction between general members and subscribers. Unlike recent social media sites where most services are free – and supported by advertising – subscribers paid for their accounts, meaning they received storage space for their photos. Presumably, subscribers (paying members) were less abusive, and their privileges should exceed that of regular members (Peter Daalder in Myers, 2005).

Few of these remedies were immediately adopted, though the problems they attempted to address would return, as would their discussion. For instance, in 2003, after noting many photos with a single high rating, a member asked: “Why are users allowed to rate their own photos?” “A quick check has shown that a disappointingly large number of these have been rated by the photographer themselves. This should be changed!!” (Spitz, 2003) Site administrators did experiment with a few changes, the most important of which was that ratings would no longer be anonymous.

Anonymity can permit one to speak honestly, without fear of retribution. Conversely, it can permit one to speak dishonestly, without concern for accountability. In the digital age, the degree to which a person and an account is linked can be a purposeful decision or an endemic problem. At, it was both. What I refer to as blinding is the disassociation of a comment or rating from its source. Blind review is a common technique across disciplines, from the sciences to the arts, so as to encourage honest evaluation. What I refer to as puppetry is the creation of multiple accounts. “Sock puppet” accounts are problematic across online communities as many of these throw-away accounts can be used by a single person to skew community deliberations (Wikipedia, 2012).

Much like the rating scales, both types of anonymity (i.e., blinding and puppetry) have long been discussed and experimented with at Indeed, anonymity was thought to be the source of many of the problems at the launch of its rating system in December 2000: “People created bogus accounts to rate themselves and their friends high, and their adversaries low.” In response to these problems, in August 2001, ratings were made public “so that people could see abuse and report it to, and so that people would be embarrassed to pump up their own ratings so much that it would be obvious…” (Brian Mottershead 2003, quoted in O’Neill, 2005). Hence instituted what I call reviewer blinding. (In 2004, photographer blinding was proposed – but not implemented – such that the identity of a photographer would only be revealed after a cooling-off period. The hope was that this would lead to impartial evaluation on the basis of the work rather than the photographer (Nichols, 2004)). While revenge-ratings still happened (Dejkam, 2003), it tended to be on a smaller scale. However, the new system gave rise to a new problem, ratings inflation, and new attempts at remedying that problem.

As we’ve seen, once one sets out down the road of calculability, of quantifying the qualitative, manipulation soon follows. Given that calculability is seen as worthwhile, in this case to populate the top rated photo page, its proponents naturally turn to further quantification and adjustments to the system. This is exemplified in a proposed remedy from 2001. Given’s origins, it should “establish cross-rating algorithm to auto-identify abusers (I’m sure there are some smart people at MIT that would love to put their collective noodles on this project)” (Brian E in rajeev et al., 2001). However, fixes to manipulation often have unintended consequences that are also susceptible to manipulation (characteristic 3).

For now that identities were associated with ratings, “mate rating” became more common. Rather than the reciprocal down-rating pervasive during the period of anonymity, people reciprocally up-rated during the period of identity. (Similarly, at photoSIG, Xu and Bailey (2012) found that the average of reciprocated ratings was higher than non-reciprocated ratings.) This was amplified in that people could leave positive ratings not only to prompt reciprocally friendly ratings, but to gain more storage space from, an incentive offered so as to encourage participation.

I have been quite disappointed in the way certain users of the site have been, in my opinion, blanket critiquing photographs. A certain user who I will not name here has been giving the ‘very nice image, thanks’ with a 6/6 rating at a phenomenal rate. It seems as though they are just trying to get more space without subscribing. I personally would like to get more constructive criticism of my photographs even if it is negative. (Patel, 2004)

This “mate fishing” as I call it certainly pays off. The person you refer to’s ratings have skyrocketed with the reciprocal payback. (Mark Lucas in Patel, 2004)

Another “fix” to the revenge-rating was that any rating below five had to be accompanied by a comment (Bengtsson, 2002). Even in 2002, members noted this asymmetry would likely cause problems.

I note that you now only can give ratings below 5 if you have written a comment. I like that, in the way that more people will add comments. But it feels a bit unsymmetric, why should I be allowed to give high ratings without telling what’s so good about the image. I think this 5 and up system without comments will lead to overratings. (Bengtsson, 2002)

Bengtsson further proposed an interesting idea: if you comment you can rate through the whole scale (1-10) but without a comment one’s ratings were attenuated (4-6) (Bengtsson, 2002). Others said that, simply, everyone should always leave a comment. (Elaine Roberts in Bengtsson, 2002) Of course, this would likely depress ratings all-together, or lead people to leave terse, poor quality comments.

In any case, the lack of anonymity and the requirement to justify a negative rating quickly led to rating inflation and the community soon lost any sense of the relative merits of older works. In 2002 administrator Brian Mottershead noted that when using the new feature of perusing top rated photos by date what “becomes obvious when the longer periods are selected is that most of the highest-rated photos are still recent photos.”

Last summer, the average rating was a little over 5, and it has increased steadily so that the average rating is over 7. This means that the further in the past a photo received its ratings, the lower its average will tend to be. A photograph that would have appeared in the “Top 40” last year, with an 8.5 average, would be nowheres-ville today. Thus, the stars of last summer, whose photos for the most part still are on the site and of course are just as good as ever have more or less disappeared from view. (Mottershead, 2002)

The inflation persisted into the following year prompting a member to (ironically) note that “of the 300 highest rated images OF ALL TIME, 113 of them were posted in 2003! It is amazing that such better photographers are being attracted to the site now. What is the secret?” Because the system was now suspect by many “it seems that the bulk of the site has stopped using the rating system altogether.” New images might receive a few ratings within the first three days of their posting, but then activity ceases.

So an image with 20 “freindly” ratings shoots to the top of the pack, and then just sits there because no one else thinks the rating system is worth saving or they have given up on it working. The change in the default page worked for a while, but the same groups flock to the images upon their release and shoot them to the top of those pages as well. (Bulger, 2003)

Hence, any particular “fix” to manipulation will have its own, not necessarily intended, consequences and is also susceptible to manipulation.

To counter the mate rating and inflation, blinded ratings returned in 2004, but with a twist. Now, one could see the average rating for one’s photo, as well as the people who rated, but not the specific ratings of any particular rater. Interestingly, this strategy was later adopted by a site for reviewing people. While many people-review sites have been poorly conceived and executed, KarmaFile, launched in 2011 as a place to “review your coworkers,” is fairly savvy. Those reviewed have the ability to see their raters and aggregate scores, but not link a specific rating to a particular person (KarmaFile, 2013a, 2013b). In 2005 Mottershead noted that this system “works much better than the previous set of rules for bringing the best photos to the fore”; site subscriptions, submitted photos, and photo ratings had all increased (Brian Mottershead in Miller, 2005). Of course, not everyone was pleased. “I miss the old way of rating photos. Someone would give you a rating. Good or bad I was always going to that persons portfolio and have a look. That would give me extra meaning.” (Baba, 2005) Also, by this point, some members were cynical that the rating system would ever be satisfactory.

Another day, another proposed change to fix the ratings system problems. The more things change, the more they stay the same. I have come to believe the only solution worth considering is primal scream therapy. Next time you get low balled give it a shot. Its not going to change a damn thing but it might make you feel better. (Larry McGarity Baba, 2005)

Another change deployed in 2004 was that comments and ratings were made distinct processes. The gallery would have two parts: one for critique, where a photograph could receive a rating or comment, and the other part would be for exhibition. “The comments on photos in the ‘Exhibition’ section will be guestbooks, and photographers will be able to moderate comments on their own photos in that section (meaning delete them). But they won’t be able to do that with photos in the ‘Critique/Competition’ section.” Additional ideas proposed back in 2001 (e.g., rate limits, thresholds, member/subscriber distinctions, and reciprocity) were also deployed. The ability to upload photos to the critique section within a given time was limited, with a higher threshold for subscribers. Thresholds could be increased through greater participation (comments and ratings given) and how “favorably the photos are received (views/ratings received)” (Brian Mottershead in C. Root, 2004).

Finally, at this point in time, (and many other sites) had deployed mechanisms (even if not developed at MIT) to “auto-identify abusers” (Brian E in rajeev et al., 2001). Account creation was now mediated by a CAPTCHA: “Completely Automated Public Turing test to tell Computers and Humans Apart.” It was now harder to create sock puppet accounts because automated programs could not easily recognize distorted text in an image. Similarly, the site could now automatically detect some types of rating abuse and remove those ratings at regular intervals. Of course, abuses still occurred and some members asked for more fixes. For instance, CAPTCHA could be extended to rating pictures itself: “I am not sure of the practicalities of such a system but entering a 3 or 4 digit code to have your rating accepted is something I would be more than happy to do!” (David McCracken in Myers, 2005). Still, the site had to be careful not to make it too difficult for new users; recently-registered members tended to be enthusiastic and “have always been responsible for the majority of ratings.… If you set up obstacles for new people to rate photos, you would reduce the number of ratings dramatically, and increase the influence of mate-raters, etc.” (Brian Mottershead in Myers, 2005). A few years later, an intermittent CAPTCHA system was deployed for ratings (J. Root, 2007).

Despite all the clever automations and features, people could still find ways to game the system. That is, instead of learning to become better photographers, they learned to become better manipulators. For instance, given that one could now make a photo available for comment or ratings, one could tactically toggle this so as to achieve a high rating: “it seems you submit a photo, get 3 or 4 7/7 ratings, then withdraw the photo from getting more ratings, and presto, one stays at the top and get a lot views” (Holtrop, 2005). Another member responded that this was actually an old trick and “It’s working as designed (or rather, it’s broken as designed).”

Yet, the idea of choice, of letting users decide what kind of feedback they could get, was a popular one. Hence, in the large reforms of 2010, choice was finally brought to bear on the old bugbear of anonymity. Josh Root,’s administrator, noted that members were unlikely to give feedback if they feared it would prompt an attack. “Lacking any way to discern which critique requests were from people who would accept honest critique and which were from those who would not, many simple stopped critiquing at all.” (J. Root, 2010) Hence, a new feature permitted users to opt-in to receiving anonymous critiques from others. “This allows people to be a bit more free with their words without the fear of getting harassed in return or upsetting a friend or any of the other reasons that cause people to leave mindless ‘nice shot’ critiques rather than helpful feedback” (J. Root, 2010).

Genre and meta moderation

It did not take long, once the rating system had been established, to notice a trend. Photographs of nude women were more popular than photographs of flowers. “It seems clear to me that the general population (on has; for example, a negative bias toward flower shots. Flower images receive low ratings and few comments, it seems whether they have any artistic merit or not” (Vardy, 2004). It is quite possible that the ratings system itself, especially the originality variable (O-score), was responsible. When it came to the genre of flowers, landscapes and such, what is the meaning of originality and what number ought to be applied? Remember the proposal that the poles of the scales be changed from very bad/good to minimum/maximum? This prompted a response about likely confusion this would have for the genre of flowers.

… so that would mean that a photo exhibiting “any” originality, even if it ill-serves the photo, should get a high O-score and all flower shots, being so ubiquitous, get a low O-score? and I don’t understand what minimum aesthetics would be compared to maximum; the photo either looks good (or better) or just okay… at least from “very bad” to “very good” gives one room to make specific interpretations of each photo. (Peggy Jones in Turner, 2004)

On the other hand, the preponderance of nudes in the top rated pages was at times puzzling, “another one of PN’s mysteries,” as well as a topic of jest (Carl Root in Schoen, 2004) . Yet, given that the community likely consists of many white male heterosexuals, one ought not be too surprised. For instance, upon checking the top 30 nudes from the past three days (on 2013-01-07) all subjects appear to be women and white. Nudes have been so pervasive in the top rankings that in addition to being able to view “all” categories, one can also view an “all (no nudes)” category.

However, in 2004 site administrators experimented with changing the default view on the top rated photos page. As mentioned, earlier proposals had suggested that photos should be ranked according to the number of comments they received. While this was eventually implemented, by July 2004 the default view was actually changed back to the average rating. This prompted a member to note that “now that average is the default rather than number of ratings, the number of nudes has dropped from 25 to 3 in the top 100” (Schoen, 2004). The resulting discussion about what to do about nude ratings prompted humorous replies, such as “I don’t think anyone should be nude when they rate photos” as well as guesses as to what could be going on (MacGregor Anderson in Schoen, 2004). For instance, given the number of nudes up for rating is small relative to all other photos, perhaps they receive more ratings per image? (James O’Neill Schoen, 2004) Site administrator Brian Mottershead offered his own theory: the popularity of nudes on the top rated page (TRP) was likely due to, in part, to the new nude category and a “reinforcing feedback loop.”

The photos on the first page or two of the TRP default page naturally get even more ratings just from being there, which just entrenches their position near the top of the TRP page and dramatically increases the number of ratings they get overall, compared to other photos.… Make something else the default, such as average, which does not depend on number of ratings: the nudes still get more ratings than before by virtue of the new category for them, but most of them don’t get extra ratings from being near the top of the default TRP page. (Brian Mottershead Schoen, 2004)

Mottershead thought that by using the average, TRP photos will still see some benefit from people being “reluctant to rate a photo on the first page low because it has a bit of halo from having made it to the first page,” but the self-reinforcement would be lessened. Member Carl Root was not quite so optimistic. He agreed this explanation was plausible, but recognized that changes often have unintended consequences that are also susceptible to manipulation. This change of showing photos on the basis of their averages would then privilege a different constituency: “I don’t think that the ‘average’ is as benign as it looks at the moment, however. In the past, it attracted photomontages and other heavy PS alterations in much the same way ‘rates’ attracted nudes. The photographers who specialize in those images will be back as soon as they see this change.” (Carl Root Schoen, 2004) That is, quantification (and the how one implements it) privileges some things over others (characteristic 4).

A few months later in 2004, another member noted another possible source for the popularity of nudes. In an attempt to identify members who seemed to give informed ratings, the site designated some members as “curators.” The intention was that these were members who are active (“rated more than 100 photos in the last 30 days”) and rate within a reasonable distribution (“an average between 4 and 5.5, with not too high a percentage of 1-2, or 6-7 ratings”) (Brian Mottershead in Bartosik, 2004 ). In the online world this is often referred to as “meta” moderation or rating. However, some members thought that this feature, like many, “while implemented with the best of intentions, became a joke not very long after its inception” (Steve Marcus in Bartosik, 2004). And, for some reason, the curators strongly preferred nudes. In response to someone asking if simply giving a lot of average ratings would make them a curator, Mottershead responded: “The ‘curators’ are people who have rated a lot recently, with a reasonable distribution of ratings – meaning a reasonable average and not too many at the extremes. Why people fitting this profile should have a preference for nudes is completely beyond me” (Brian Mottershead in Bartosik, 2004).This prompted another member to disclaim this preoccupation with engineering the ratings system.

Sheeeesh.…What I do not ‘get’ and never shall is WHY this site’s powers that be have this obsession with means and reason and oh so many things average?! Not to put too fine a point on it but for crying out loud (really), why not simply let the ratings speak to which photos are most popular amongst all viewers?! And just let it ride. (James Vincent Knowles in Bartosik, 2004)

In the wide-sweeping reforms of 2010 the idea of “curators” was revisited by way of “helpful users.” Given that writing “in depth comments takes time, thought, and energy” and people do this “out of the goodness of their heart and their willingness to help other photographers,” helpful comments can now be “rated” as such. Helpful members are then recognized via a “special user icon” and their images will have “extra visibility in various areas of the site” (J. Root, 2010). This sequence of decisions and discussion highlights one more characteristic (number five) of digital evaluation systems: fixes to rating systems often take the form of more elaborate, automated, and meta quantification.

Tweak critique

Ansel Adams is famous for his beautiful black and white portraits of the American landscape. He was known for the care he took in capturing an image on a negative (using a large format camera) and in rendering a print (e.g., dodging so as to make some areas lighter areas and burning others so as to make them darker). Within photographic lore, he is often quoted as saying “The negative is comparable to the composer’s score and the print to its performance. Each performance differs in subtle ways.” But with software, the changes need no longer be subtle. At the question of how much is too much was common and members debated whether the amount of manipulation was within the purview of critique. One member noted that while his personal philosophy was to “leave the photos as close to the original as possible” he still felt others’ “comments should be restricted to originality and aesthetic and not delve into retouching and RBG curve twisting” (Archer, 2004).

Additionally, the novel affordance of digital imaging is that, unlike in the dark room, the composer and conductor need not be the same person. So far, my focus has been on difficulties of ratings and rankings – following the discussion on the ratings and critique forum. However, the online realm does offer significant benefits. A photographer has access to enormous amounts of information, discussion, and examples. She can reach (potentially) many more viewers. And she can connect with others with similar (even if specialized) interests. Also, with the aid of a graphics editing program, any critic can render their own performance of an image for comparison. Consequently, one member asked “Is it generally considered good or bad form to offer a tweaked version of an image submitted for critique?”

Initially, I would never have presumed to offer an alternative view of an image. Then, a member here offered me a revised crop on one of my own photographs that improved the image significantly. Frankly, I was grateful for the input and realized that the tweaked version spoke more eloquently than words ever would have. So, thereafter, I began to post some tweaked versions (usually just cropping/alignment/color balance stuff). (Minicucci, 2005)

However, some at, especially newcomers, did not welcome such “performances” of their composition. In the discussion following Minicucci’s question most all participants noted it was an acceptable practice at Some argued that one has little control of one’s content on the Internet, so if you are concerned, don’t post it. Others noted they choose to attempt to civilly opt-out by asking others not to alter their images. Many responded they appreciate helpful tweaks: “It’s kind of like book editing. No writer should ever resent a good editing” (John Crosley in Minicucci, 2005). Indeed, is fairly novel in encouraging this type of critique and the practice is explicitly permitted in its terms of service if it is done as part of commentary and discussion (js, Bob Atkins in Minicucci, 2005). Hence, characteristic (six) of digital evaluation is that digital works can be “tweak critiqued”: photos are not only evaluated or commented upon, but demonstratively altered.


From the reported discussions at, one can identify a handful of characteristics of evaluation in the digital age.

  1. It’s hard to quantify the qualitative: there was much experimentation with rating and ranking systems.
  2. Quantitative mechanisms beget their manipulation: people “mate” rated friends, “revenge” rated enemies, and inflated their own standing.
  3. “Fixes” to manipulation have their own, often unintended, consequences and are also susceptible to manipulation: non-anonymous ratings led to rating inflation.
  4. Quantification (and the how one implements it) privileges some things over others: nudes were highly rated, more so when measured by number of comments, not so with photos of flowers.
  5. Any “fixes” often take the form of more elaborate, automated, and meta quantification: such as making some users “curators” or labeling them as “helpful.”
  6. Digital works can be “tweak critiqued”: photos are not only rated and commented upon, but demonstratively altered (e.g., cropping).

Despite the challenges of some of these characteristics, many people have benefited from their participation at I found many productive discussions about what constitutes useful evaluation and feedback. For instance, the thread “Getting the confidence to critique” was full of useful tips and reminiscent of the sound advice in Terry Barrett’s Criticizing Photographs: one should study works one likes and dislikes; consider the subject, lighting, and depth of field of a work; practice explaining one’s own reaction objectively; read others’ comments and ask yourself which comments do you find most useful? (Mark Grant, Carl Root, Ben S, Wilson Tsoi, and Seven Stuartson in Parton, 2005) However, for better or worse, Ritzer’s “urge to quantify” is the dominant logic of digital evaluation. As Mottershead noted “the rating system on is very addictive” (Brian Mottershead in Vechnyak, 2005). While he spoke of this addiction as a user of the site, the site itself was also always chasing the latest “fix.” The evolution of the rating system at anticipates and reflects similar challenges faced elsewhere online. I’d like to think the systems we have now are more accurate and fair than the ad hoc experimentation of’s early history. Even so, the lessons learned at are still relevant to even the most advanced forms of digital evaluation: they are imperfect, biased, and prone to manipulation.


Administrators. (2002, June 8). Migration of LUSENET Photography Forums. Help and Feedback Forum.
Archer, B. (2004, January 18). The philosophy behind ? Site Help > Photo Critique and Rating.
Baba, A. (2005, August 24). Old rating system better !! Site Help Forum > Photo Critique and Rating.
Barrett, T. (1996). Criticizing photographs: An introduction to understanding images (2nd ed.). Mayfield.
Bartosik, M. B. (2004, August 20). And you curators can sing? Site Help > Photo Critique and Rating.
Bengtsson, D. (2002, June 24). The new rating system. Site Help > Photo Critique and Rating.
Bulger, S. (2003, April 18). Rating Inflation Update. Site Help > Photo Critique and Rating.
Campbell, D. (1976). Assessing the impact of planned social change (Occasional Paper Series, Issue 8). Dartmouth College Public Affairs Center.
Counts, J. (2004, October 2). My idea for improving ratings. Site Help > Photo Critique and Rating.
Cox, E. P., III. (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 27, 407–422.
Dejkam, A. (2003, July 6). rating in revenge. Site Help > Photo Critique and Rating.
Gibbs, L. (2012, August 24). Continuing problems with peer feedback. Coursera Fantasy.
Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: strategies for qualitative research. Aldine Publishing Company.
Goldstein, J. (2007, October 22). purchased by NameMedia. JMG-Galleries.
Goodhart, C. (1975). Problems of monetary management: The U.K. experience. Papers in Monetary Economics, 1.
Greenspun, P. (2000a, March 3). ArsDigita: From start-up to bust. Waxy.
Greenspun, P. (2000b, July 14). Travels with Samantha. Books.
Greenspun, P. (1997, July 22). Archive.
Greenspun, P. (1996, September 3). The book behind the book behind the book…. Greenspun.
Greenspun, P. (1998, September 10). Philip and Alex’s guide to web publishing. Greenspun.
Greenspun, P. (2012, December 3). Philip Greenspun - photographer Biography. Photo.Net.
Greenspun, P. (1993). Travels with Samantha.
Holtrop, M. (2005, April 6). A new trick to stay at the top of Top Rated? Site Help Forum > Photo Critique and Rating.
KarmaFile. (2013a, July 30). How the KarmaFile peer review system works. KarmaFile.
KarmaFile. (2013b, July 30). Popular questions about KarmaFile. KarmaFile.
Kelly, L. (2004, July 31). Fractions of a point in the ratings? Site Help > Photo Critique and Rating.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. SAGE publications.
Luong, Q. T. (2002, June 11). A personal perspective on the move of the LF Forum to Largeformatphotography.
LUSENET. (2012, December 11). All forums within LUSENET. Greenspun.
M, N. (2004, August 5). Minimum no. of ratings for TRP display. Site Help > Photo Critique and Rating.
Meme, K. Y. (2013, August 18). Karma whore. Know Your Meme.
Miller, C. (2005, June 3). Another rant about ratings. Site Help Forum > Photo Critique and Rating.
Minicucci, P. (2005, March 5). Critique Etiquette. Site Help Forum > Photo Critique and Rating.
Mottershead, B. (2002, August 2). Feedback Requested on Normalizing Ratings. Site Help > Photo Critique and Rating.
Myers, J. (2005, January 4). This is getting ridiculous! Site Help Forum > Photo Critique and Rating.
Nichols, T. (2004, July 2). What would happen if those posting the photographs could be temporarily annonymous? Site Help > Photo Critique and Rating.
O’Neill, J. (2005, April 19). How to fix my biggest complaint with rating system. Site Help Forum > Photo Critique and Rating.
Parton, N. (2005, January 25). Getting the confidence to critique. Site Help Forum > Photo Critique and Rating.
Patel, K. (2004, November 22). annoyed with blanket critiquing. Site Help > Photo Critique and Rating. (2012a, August 21). About us. (2012b, December 6). Photography community, including forums, reviews, and galleries from (2012c, December 11). Forum contributions by Philip Greenspun.
Pinto, F. (2005, June 22). Photos. Photo.Net.
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologics, 104, 1–15.
rajeev, lisa, audrey, & philg. (2001, August 2). Photo Rating Suggestions. Site Help > Photo Critique and Rating.
Reddiquette. (2019, February 18). Reddit.
Ritzer, G. (1983). The ‘McDonaldization’ of society. Journal of American Culture, 6(1), 100–107.
Rodriguez, R. (2003, November 23). Should members with no photos cretique. Site Help > Photo Critique and Rating.
Root, C. (2004, June 30). The new comment note under RFC images. Site Help > Photo Critique and Rating.
Root, J. (2011, August 22). New Facebook/Twitter/Google+/Stumbleupon sharing buttons on Help and Feedback Forum.
Root, J. (2010, September 14). Summarizing the recent changes to the ratings/critique system…. Help and Feedback Forum.
Root, J. (2007, November 9). New captcha verification for rating system…. Help and Feedback Forum.
Schoen, D. (2004, July 22). Nude ratings and the avg. vs ratings issue. Site Help > Photo Critique and Rating.
Spitz, M. (2003, April 19). Self-Ratings. Site Help > Photo Critique and Rating.
Strathern, M. (1997). ‘Improving ratings’: audit in the British University system. European Review, 5(3), 305–321.
Thomas, G., & Jones, D. (2006). Reinventing grounded theory: Some questions about theory, ground and discovery. British Educational Research Journal, 32(6), 767–795.
Turner, B. (2004, February 19). Change ratings word association from, Very Bad/Excellent to Min./Max.? Site Help > Photo Critique and Rating.
Vardy, M. (2004, September 17). Biases on PN…. Site Help > Photo Critique and Rating.
Vechnyak, V. (2005, February 4). Who gave me this rating??? Site Help Forum > Photo Critique and Rating.
Wikipedia. (2012, December 27). Sockpuppet (Internet). Wikimedia.
Xu, A., & Bailey, B. P. (2012). What do you think? A case study of benefit, expectation, and interaction in a large online critique community. In CSCW’12. ACM.