The Paper

Citation

L.V. Ahn and L. Dabbish, “Labeling images with a computer game,” Proceedings of the SIGCHI conference on Human factors in computing systems, Vienna, Austria: ACM, 2004, pp. 319-326.

 @inproceedings{985733,
  author = {von Ahn, Luis and Dabbish, Laura},
  title = {Labeling images with a computer game},
  booktitle = {CHI '04: Proceedings of the SIGCHI conference on Human factors in computing systems},
  year = {2004},
  isbn = {1-58113-702-8},
  pages = {319--326},
  location = {Vienna, Austria},
  doi = {http://doi.acm.org/10.1145/985692.985733},
  publisher = {ACM},
  address = {New York, NY, USA},
  }

Abstract from the paper

We introduce a new interactive system: a game that is fun and can be used to create valuable output. When people play the game they help determine the contents of images by providing meaningful labels for them. If the game is played as much as popular online games, we estimate that most images on the Web can be labeled in a few months. Having proper labels associated with each image on the Web would allow for more accurate image search, improve the accessibility of sites (by providing descriptions of images to visually impaired individuals), and help users block inappropriate images. Our system makes a significant contribution because of its valuable output and because of the way it addresses the image-labeling problem. Rather than using computer vision techniques, which don't work well enough, we encourage people to do the work by taking advantage of their desire to be entertained.

Summary

Synopsis

In this paper, von Ahn and Dabbish try to solve the problem of accurately labeling images on the web. Their solution is The ESP Game. (Originally alone at espgame.org, it is now collected with the other GWAP games), which impels individuals to provide their own labels for images by embedding the labeling process within a game. Their results are markedly more successful than those being used by search engines in 2004, and they suggest that disguising problems within games is a generally useful method for solving computationally difficult problems. The ESP Game has since become the most well-known example of a game with a purpose.

Related Work

The authors cite very little related work - that is, the primary impetus for their own work is The Open Mind Initiative, a group which generally tries to encourage the collection of online data via user contribution. They do cite several papers from the computer vision and web search branches of Computer Science that also try to solve the labeling problem, but this is done at the end of the paper and primarily serves as a contrast with the effectiveness of their own method.

How To Play

The User's Perspective

When a player starts a new game, they are randomly matched up with another individual who wants to play. Each player is shown a copy of the same picture, and told to type in guesses for what their partner is typing for the image. (Not to describe the image.) Some words are specifically marked as "Taboo", and cannot be entered. All guesses are logged, and when players finally have a set of matching words, they shift to a new image. At any time, players may opt to "Pass" and give up on guessing. Their partner is informed that the other player has passed, and as soon as they also choose to do so the game will switch to a new image. The game lasts for 2.5 minutes, and players earn points based on the number of times that they agree

The Back End

Taboo words are assigned if a threshold number of agreements between different players have been seen; the authors use a threshold number of 1.
Images can be considered done when multiple players pass repeatedly on the same image.
Images can be reintroduced to the system periodically to determine if the best labeling for the image has changed.
A dictionary is matched up with player words to ensure that guesses are spelled correctly.
350,000 images taken from random.bounceme.net, but could come from anywhere.
Players can potentially be paired up with recordings of previous players' guesses if no partners are found.

Usage Statistics

Results presented from the four month period of August 9 - December 10,2003
13,360 players (presumably identified from IP logs), with 80% of players returning at least once.
Each image received a mean of 3.89 labels per minute of play, with standard deviation of 0.69.
Overall, 1,271,451 labels were provided for 293,760 images.

Label Validation

Test 1

Presumably only von Ahn and Dabbish were the testers for this portion.
Looked at the results generated when searching on 10 random labels.
All results returned made sense with respect to the label.

Test 2

15 participants aged 20-25 were presented with a random set of 20 images taken from those with more than five labels. Each participant was presented with the images in a random order.
Participants were asked:

Please type the six individual words that you feel best
describe the contents of this image. Type one word per line
below; words should be less than 13 characters.

For all of the images, at least 5 of the 6 labels generated by the ESP Game were entered by at least one of the participants.
The three most common words entered by all participants for each of the given pictures were included in the list generated by the game

Test 3

Again 15 participants aged 20-25 were presented with 20 randomly selected images that had more than five labels, with order randomized for each participant.
Participants were asked:

1.  How many of the words above would you use in
describing this image to someone who couldn’t see it.
2.  How many of the words have nothing to do with the
image (i.e., you don't understand why they are listed
with this image)?

For question 1, the mean was 5.105 words, with a standard deviation of 1.0387
For question 2, the mean was 0.105 words, with a standard deviation of 0.2529

Conclusion

Von Ahn and Dabbish acknowledge that their system works both well and efficiently as an image classifier, and suggest that using games to accomplish difficult computational tasks may have broader applications. The two most salient quotes are:

At this rate, 5,000 people playing the ESP game 24 hours
a day would label all images on Google (425,000,000 images)
in 31 days. This would only associate one word to each image.
In 6 months, 6 words could be associated to every image.
Notice that this is a perfectly reasonable estimate: on a
recent weekday afternoon, the authors found 107,000 people
playing in Yahoo! Games, 115,000 in MSN’s The Zone and
121,000 in Pogo.com. A typical game on these sites averages
well over 5,000 people playing at any one time.

and

...[O]ur main contribution stems from the way
in which we attack the labeling problem. Rather than
developing a complicated algorithm, we have shown that it’s
conceivable that a large-scale problem can be solved with a
method that uses people playing on the Web. We’ve turned
tedious work into something people want to do.

Subsequent/Related Publications

The paradigm used by von Ahn and Dabbish has been leveraged both academically and in the public domain. Indeed, this aspect of the social web is less a fact of good algorithm design and more an artifact combining some fundamental human drive to play and the mass presence of a large number of individuals

Von Ahn has built on his work with the ESP game with a set of subsequent games known as GWAP (games with a purpose). He has written several papers describing these other games.
- "Tagatune: A Game For Music and Sound Annotation", by Law, von Ahn, Dannenerg, and Crawford
- "Matchin: elicitng user preferences with an online score", by Hacker and von Ahn
"Curator: A Game with a Purpose for Collection Recommendation", by Walsh and Golbeck, discusses the creation of a game to determine optimal pairings of objects in two different sets. (e.g. shoes and purses)
Jane McGonigal has explored individuals' drive to play as a method for trying to solve very large problems. In 2003 she wrote about this in the context of large-scale alternte reality games in "This Is Not a Game: Immersive Aesthetics and Collective Play" (Note that this paper was published while von Ahn and Dabbish were conducting their research and is not cited in their own work.)
"Toward a cultural-sensitive image tagging interface", by Dong and Fu, looks at how tag style differs across different cultures. This has implications for scenarios where the population of ESP Game players and users of the ESP Game's data have significantly different backgrounds.

Subsequent/Related Artifacts

In 2005, Amazon.com publicly rolled out the [www.mturk.com Mechanical Turk], which allows individuals to easily hire people to carry out simple tasks, such as the image tagging carried out when playing the ESP Game. While the ESP Game is less expensive and relies on a different, though possibly similar, motivator than Mechanical Turk, the two come from a similar desire to solve large problems by bringing communities to bear on them. One profitable line of research may be to determine just how ludic motivations and financial motivations compare when trying to complete a task.
One of the possible applications that von Ahn and Dabbish bring up for their work is using a game to get people to monitor security cameras. This has already been done in both the UK and at the US/Mexico Border. Note that the motivation in these scenarios is not purely ludic - the former is driven by fiancial motivation, the latter by volunteerism

Ahn and Dabbish, Labeling images with a computer game, SIGCHI, Vienna, Austria: ACM, 2004, pp. 319-326.

Contents