The Democratization of Metadata: Collective Tagging, Folksonomies and Web 2.0

Joshua M. Avery

Library Student Journal,
February 2010


This paper explores some of the ways in which folksonomies are shaping notions and methods surrounding contemporary knowledge management, how they are currently being used and how information professionals are reacting to these developments. This paper will also explore the future of folksonomies and their contribution to the growth of Web 2.0 and a more democratic World Wide Web.


Peer produced information is ubiquitous. Yochai Benkler (2007) in The Wealth of Networks argues that the widespread use of peer produced information, software, and systems is beating "the best-financed business enterprises in the world" (p. 59). Benkler is correct that open source software, as well as blogs and wikis, are an increasingly important slice of the information economy. Yet, he overlooks one of the fastest growing segments of peer produced information - collaborative indexing. Thomas Vander Wal, in a 2004 list-serve discussion, coined the term most frequently used when talking about social or collaborative indexing: "folksonomy" (Vander Wal, 2007). Vander Wal insists that he prefers the word "folk" when talking about regular people and thought that if you took 'tax' (the work portion) of taxonomy and replaced it with something "anybody" could do you would get a folksonomy. Jim McClellan (2005) provides a wonderful summary of a folksonomy in the Guardian when he describes it as a "bottom-up" organizational category that "emerge[s] when individuals tag or describe information and images and those tags are pooled" (para. 15).

The following paper will explore some of the ways in which folksonomies are shaping notions and methods surrounding contemporary knowledge management, how they are currently being used, and how information professionals are reacting to these developments. This paper will also explore the future of folksonomies and their contribution to the growth of Web 2.0 and a more democratic World Wide Web. Because of the nature of folksonomies and their omnipresence throughout the information economy, this paper will examine both scholarly and popular sources.

Folksonomies and tagging are still relatively young, but their impact on the development of the web has been large. Currently folksonomies are primarily used in social networking sites, such as Facebook, that offer access to large image collections. They are, however, increasingly finding homes in museums, libraries, and a large assortment of educational and corporate environments. Folksonomies and tagging are being met with skepticism by some in the information sciences who argue that these schemes are philosophically relativistic and will lead to a system breakdown (Peterson, 2006). Other information professionals appreciate the weaknesses inherent to folksonomies, yet still celebrate their potential for creative and dynamic information organization (Guy & Tonkin, 2006).

Key to understanding folksonomies is examining current folksonomic practice. It is here that collective indexing provides what is perhaps its most compelling justification while increasingly offering new ways that the "power of the people" can be harnessed for the greater good of the information economy.

Folksonomies and Contemporary Practice

James Surowiecki (2004) in The Wisdom of Crowds argues that "under the right circumstances, groups are remarkably intelligent, and are often smarter than the smartest people in them" (p. xiii). Surowiecki's claim of group intelligence is, to say the least, controversial, but his notion that large groups of people are able to create and manage information in ways that are precise and accurate provides the impetus for much of the current theory behind collaborative indexing. Attaching a few words or phrases to a digital object such as a document or a photo (a practice known as "key wording") has long been a part of the digital era (Weinberger, 2005). However, it was not until websites like Flickr and software like Adobe Photoshop Album offered information seekers the opportunity to label or "tag" information objects that the potential and character of folksonomic practice began to emerge.

Flickr, a photo organizing/sharing service was among the first web services to offer non-directed image tagging (also known as free tagging) to a widespread audience (Graham, 2006). While still in beta, its membership was over two hundred and forty five thousand, and its membership grew by more than five percent a week (McClellan, 2005). On any given visit to Flickr, one can typically find more than four thousand recent uploads (, 2009). Flickr is hardly the only site dedicated to photography that allows for the collective tagging of images. We Heart It, FFFFound and are among the latest in photo-oriented bookmarking sites (Wortham, 2008). Flickr and its photo-focused brethren sites are only a fraction of the ways in which current internet users are collectively creating metadata.

In 2005 the New York Times reported on the increasing popularity of using social networking sites (the backbone of contemporary folksonomies) that are geared toward purposes, not people (Todras-Whitehill). Exemplary of this trend are, which allows users to bookmark Web sites and then share the bookmark;, which connects users with shared goals; and, a nonprofit site that brings users together to participate in civic actions, like starting a political group or giving blood. While some social networking sites are organized around themes or social causes, many, such as Facebook, MySpace and Hyves, are organized around people and, like Flickr, allow members to upload and "tag" images. Social networking sites have millions of members and collectively store billions of images; Facebook alone sees more than 850 million images uploaded monthly (Statistics, 2009).

Information organizations of a more educational focus are capitalizing on users' willingness to engage in social classification. A growing trend among museums is to ask patrons to help with classification of their online collections - a sort of curatorial democracy. As part of the growing semantic web trend of tagging, museums all across the globe are allowing (or rather soliciting) tags, or labels, that describe the content (of photos, Web links, art) and in turn using these new taxonomies to supplement their traditional thesauri. The Cleveland Museum of Art, the Smithsonian Institution, and the Powerhouse Museum in Sydney, Australia have all experimented with tagging applications on their Web sites. In the fall of 2005, the Metropolitan Museum of Art allowed volunteers to supply keywords for thirty images. The user-supplied terms were subsequently compared with the museum's curatorial catalog. The results showed less than 20% of the user-supplied terms to have already been in the museum's documentation.

This research, combined with other studies, inspired the Steve.Museum tagging project (O'Connell, 2007). The Steve project (Welcome to the Steve Project, n.d.) hopes to "provide profound new ways to describe and access cultural heritage collections and encourage visitor engagement with collection objects" (para. 1) as well as research the utility of social tagging in serving the museum community. In spite of this headway by museums, social networking sites and photo centered storage and retrieval sites continue to make up the bulk of current image based collective metadata generation. Information science professionals are both accepting and skeptical of the rise of folksonomies and their place in the development of Web 2.0. The information science field is composed of academics, researchers and practitioners whose opinions of folksonomic practice will prove crucial to its acceptance and promotion throughout the wider information economy. In short, their perceptions of the collective creation of information will prove vital to its success or failure in the twenty-first century.

Social Indexing and Information Professionals

Folksonomies, by their very nature, rely on and in turn reveal the generative power of the collective. Collective indexing produces more than "bottom up" informational organization: these comprehensive taxonomies are providing positive challenges to traditional meta-narrative and literate-based ways of claiming knowledge. Alex Wright (2007) in Glut: Mastering Information Through the Ages, insists that much of the writing (including collectively generated tags and labels) that appears on the web is akin to what linguist Walter J. Ong calls "secondary orality" (p. 232). Wright argues that we are currently witnessing the "reemergence in electronic form of oral patterns that have been hiding . . . for generations" (p. 232). Drawing from Ong, Wright contrasts oral traditions from literate cultures: oral traditions are additive not subordinative, aggregative not analytic, empathetic not objective, and situational not abstract. He asserts that the aggregative way in which such things as tags or labels gain authority is evocative of an "oral tradition" (Wright, 2007, p. 234). In short, user-driven mechanisms, such as tagging, require a cumulative "oral tradition" and work in opposition to a literate tradition which would demand a single, authoritative voice - analytic and objective - in determining portions of the organizational structure of the sites' information content. Wright suggests that sites which allow users to, over time, reach a consensus as to how the information architecture will be shaped, exist in uneasy tension with traditional top-down organizational models. Perhaps he best sums up professional attitudes toward this emerging oral-folksonomical paradigm when he labels them as prone to "resist a smooth reconciliation" (Wright, 2007, p. 234).

Among the information professionals who resist this reconciliation between top-down informational organization paradigms and a "bottom up" approach to the creation of metadata is Theresa Regli. Regli (2008) is wary of user input in the search process and argues that crowd wisdom, touted by writers such as Surowiecki does not mean that "everyone (or anyone, for that matter) is tagging correctly (a notion that's often ignored in the realm of social software)" (p. 19). She admits that adding "human input" aids in relevancy, but insisted that there is "no guarantee that . . . the way one person tags something will be relevant to someone else" (Regli, 2008, p. 19). Regli avers that user tags might be useful but intimates that their value is largely limited to supplementation of formal, authoritatively created categories. Nevertheless, many information management scholars disagree with Regli's analysis.

Author and information architect Jon Udell (2004) insists that the abandonment of taxonomy is the first ingredient of success in building a shared database and developing a metadata vocabulary. Udell (2004) argues that a "flat namespace," which functions as a sort of "bag of keywords" bound in a tight feedback loop and provides incentive to alter or discard a tag if it seems unsuitable, is the best possible way to collectively enrich shared data (p. 36). Udell is certainly not alone is his analysis. Peter Merholz (2004), contends that many classification systems "suffer" from an inflexible top-down approach, "forcing users to view the world in potentially unfriendly ways" (para. 1). Merholz prefers the term "ethno-classification," instead of folksonomy, in referring to how people classify and categorize the world (2004, para. 3,4). He asserts that the primary benefit to free tagging, of the sort done on the photo sharing site Flickr, is that the classification makes sense to users and can reveal terms and newborn words that "experts" might have "overlooked" (Merholz, 2004, para. 6). Merholz does not see such tagging systems as a panacea and disagrees with Udell that totally flat systems are preferable. He laments that free tagging systems lack controlled vocabulary and admits that users often develop multiple terms for identical concepts. However, he does see folksonomies, with all of their shortcomings, as the solution. He ruminates that smart designers allow pedestrians to create "design paths," the worn foot-paths that appear over time in landscape through use, and then pave the emerging walkways for maximum utility. So too does Merholz (2004) see ethnoclassification systems as "emerging" and, once preliminary systems are in place, only then should controlled vocabularies be developed. Merholz is not the only writer to tackle the claim that folksonomies are little more than anarchies on the World Wide Web.

Journalist Jessica Dye (2006) writes that tagging is "at the core" of some of the web's most "vibrant and cohesive communities." With that said, Dye couches her argument in the safety of her insistence that "not everyone is ready to leave behind the structured comfort of a controlled taxonomy and jump on the self-tagging band wagon just yet" (p. 38). Among those least receptive to self-tagging she names librarians and information architects. She explains that folksonomies fall into two categories. They are either broad, when third-party users assign tags to the same content and those tags are then aggregated and made searchable, or narrow, those skewed toward the individual user. Dye cites Bob Doyle, a webmaster and editor, who teaches that traditional taxonomies are best for static information but folksonomies best serve information that is more dynamic, such as RSS feeds or blogs. Doyle seems indicative of those in the information science community who long for a middle path - between those who totally resist folksonomies and those who would welcome the destruction of all traditional, "top-down" taxonomical structure in favor of folksonomic practice. It is to those who seek that "middle road," those who are striving to understand, accept and enhance folksonomies and their use as part of Web 2.0 that we now turn.

Folksonomies and the Future

Web 2.0 is a controversial term. It is most commonly understood to mean web development and web design that will facilitate interoperability, user-centered design and collaboration on the World Wide Web (Web 2.0, 2009). However, there is no definition that is universally agreed upon. Tim Berners-Lee, in 2006, defined it as both an "interactive space" and a "piece of jargon" that nobody even knows what it means (Laningham, 2006). He went on to insist that the technological components of Web 2.0 have been in use since the early days of the web. Berners-Lee further argued that those who claim Web 2.0 is about connecting people versus computers do not understand that "Web 1.0 was all about connecting people." (Laningham, 2006). Nevertheless, many information professionals see real differences in Web 1.0 and Web 2.0. Tim O'Reilly (2007) argues that Web 2.0 is "the network as platform" and aims to create "rich user experiences" through a network of participation (p. 17). It is this 'network of participation' that truly separates Web 2.0 from its predecessor and it is also what makes folksonomies such a vital part of the Web 2.0 platform.

Ciro Cattuto (2006) explores the semiotic dynamics of these participatory networks as they apply to tagging. Cattuto (2006) insists that "despite the selfish nature of users' behavior" tagging systems "exhibit cooperative dynamics that eventually lead to a bottom-up categorization of [shared] resources" (p. 33) He goes on to assert that collaborative tagging recruits simple and robust user behavior that fosters "shared conventions at the system level" (p. 33). Cattuto examined a variety of tags from and discovered that (after an initial transient) the relative proportion of each tag settles towards an approximately constant value. He asserts that this type of robustness shows that in collaborative tagging, tag fractions stabilize rather quickly, allowing for the emergence of a few top-ranked tags forming a sort of semiotic "fingerprint" and that this quick stabilization becomes long term and makes the emerging categories invulnerable to noise (p. 35). It should be pointed out that Cattuto only examined tag groups that were sufficiently large. Cattuto is not the only information professional who has envisioned the positive and perdurable aspects of folksonomies.

Phyllis Snipes (2007) writes that not only are folksonomies stable, but that they are actually useful in teaching students information literacy skills. Snipes examined, Flickr, and a wiki and insisted that each site, and other sites like them, are great ways to assist both students and faculty in the learning and teaching process. She argues that students often struggle through large numbers of websites they retrieved in previous searches trying to re-locate data. Sites such as, which allows their users to tag websites, according to Snipes, solve this problem. Snipes believes that students working alone or in groups could easily avoid superfluous searching, since sites could be tagged to meet their particular research needs. She points out the added benefit that students can collaborate remotely, making for more fluid group assignments. Further, she believes that faculty stand to gain, as they can also draw on tagging as they attempt to collaborate over professional (or personal) items.

Flickr is another site that Snipes (2007) sees as potentially fertile for helping students to learn. Its massive database of images makes for an ideal learning tool, as many of its photos are high quality, and she contends that students would be able to extract a virtual goldmine of free visuals for projects. Lastly, she looks at wikis for their educational potential. Snipes is thrilled at the prospect that teachers could allow students to collectively contribute to the knowledge creation process and insists that, if properly instructed, students could classify and categorize this newly created knowledge in powerful and useful ways. In short, Snipes argues that folksonomies are "here to stay" and that teachers and students alike should be prepared and willing to fully exercise their robust powers of organization and retrieval.

Like Snipes, Darlene Fichter is excited about the potential that folksonomies offer. Fichter (2006) insists that "lots of people are willing to add tags" and that this willingness to tag creates that priceless feature of all folksonomies - nimbleness. She believes that tagging increases "discoverability" of information items and are most useful to items that do not posses meta-data already. Fichter envisions public libraries that would allow the collective indexing of historical images that belong to a local community, citing Pennsylvania State University Libraries' use of a social bookmarking service called "Penn Tags" as precedent.

Fichter also insists that corporations are becoming, and will become, attuned to the benefits of the collaborative creation and indexing of information. She examines IBM's use of "dogear," an enterprise-wide social bookmarking tool, as well as Lucent's use of bookmarking tools to meet the needs of digital users. Companies and information organizations of all sizes will, according to Fichter, begin to roll out test-bed projects for tagging and other knowledge categorization tools. Fichter dreams of a future in which library catalogues could display tags and perhaps even mine tags from such sites as Amazon in creating a more robust Web site. She admits that the future of folksonomies is unclear, but she is certain that all information professionals should be prepared for a period of rapid development and experimentation (Fichter, 2006).

Despite the exciting and egalitarian ends to which folksonomies may be employed, it would be unfair to portray all resistance to collective indexing as "anti-democratic." Folksonomies are imperfect, and, like traditional taxonomies, contain inherent weaknesses. It is these shortcomings that have led many in the information sciences to reject a wholesale acceptance of the collective creating of knowledge.

Problems with Folksonomies

Cattuto et al. (2007) are correct when they insist that the field of folksonomies is still young, with relatively few scientific publications about the topic. Nevertheless, those who oppose tagging and collective indexing have been vociferous. These opponents are often very quick to point out that a lack of structure and controlled vocabulary make folksonomies unreliable. Shirky (2004) sums up what many opponents of folksonomies most fear when he admits their "lack of precision" is often problematic (para. 5). Noruzi (2007) argues that tags often stumble when dealing with "plurals, polysemy, synonymy" and specificity (para. 12). Peterson (2006) points out that not only can tags be messy or imprecise, but are often just plain wrong. She points out that an article might be tagged with both "white horse" and "black horse." as tags allow contraries to exist. According to Guy and Tonkin (2006), about 40% of Flickr tags and 28% of tags were misspelled or in some way incorrect. Inaccurate tags are not the only concerns opponents of folksonomies have raised.

Condon (2006) implies that librarians who seek to incorporate folksonomies in library catalogs are not progressive and technologically astute but rather like "Tom Sawyer getting the neighborhood boys to whitewash the fence for him" (p. 45). He expresses incredulity toward the belief that the public can be counted on to non-sporadically index anything.

Not all resistance to folksonomies springs from a concern about their accuracy. Often these collaborative efforts yield results that individuals may wish to undermine. According to Lisa Guernsey (2008) of the New York Times, de-tagging (removing the tag from one's photo in Facebook) has become "an image-saving step in the college party cycle." Guernsey quotes a Radford University student who said that once the event happens "pictures are up within 12 hours, and within another 12 hours people are de-tagging." Another student explained that a photograph of her "holding something I shouldn't be holding" was a prime candidate for tag removal (Guernsey, 2008).

While many of the doubts and anxieties surrounding collective tagging have some merit, the deepest concerns of those who oppose folksonomic practice have simply failed to materialize. Folksonomies, imperfect as they are, are increasingly becoming a vital tool in organizing the web for more efficient information retrieval. It is certainly true that the tags which make up many folksonomies are flawed, yet herein lies one of collective indexing's greatest strengths: the ability to quickly overcome sub-par agent-level indexing through a rapid emergence of stability at the global level (Cattuto, 2006). As Amy Gahran of observes, folksonomy "merges, diverges, and evolves much the way language does, through usage and interaction" (Gahran, 2005, para. 9).


Until recently, the traditional theory of categories insisted that there were "right" and "wrong" ways to think about groupings, concepts and classifications. However, the last few decades have seen a weakening of this position's validity. Yet, in the field of Library and Information Science, classification as a process still involves the "orderly and systematic assignment of each entity to one and only one class within a system of mutually exclusive and nonoverlapping classes" (Jacob, 2004, p. 8). The categories that are created by collaborative indexing certainly do not fit into this understanding of classification. Some professionals do not even consider tagging the same as categorization (Albrycht, 2006). However, to deny that collective indexing provides a unique, and potentially valuable, form of categorization seems less than constructive. Collective creation of information is here to stay. So too is the collective indexing of that information. Folksonomic practice offers a useful form of bottom-up informational organization that no longer solely privileges more traditional, top-down approaches. Yet, folksonomies should not be seen as opposed to traditional taxonomies, but rather as supplementary to them. Most information professionals welcome the merger, however unsteady, of traditional and folk taxonomies, and libraries, museums and corporations are ready to adapt and adopt the techniques of collective organization pioneered by Flickr, Facebook and Even in this early period of chaos, the emerging knowledge and organization seems more democratic - and in turn genuine - than any that can be imposed.


