Curated by: Luigi Canali De Rossi

Wednesday, January 5, 2005

Grassroots Cooperative Categorization Of Digital Content Assets: Folksonomies, What They Are, Why They Work

Sponsored Links

"A folksonomy represents simultaneously some of the best and worst in the organization of information.

Its uncontrolled nature is fundamentally chaotic, suffers from problems of imprecision and ambiguity that well developed controlled vocabularies and name authorities effectively ameliorate.

Photo credit: Justin Bird

Conversely, systems employing free-form tagging that are encouraging users to organize information in their own ways are supremely responsive to user needs and vocabularies, and involve the users of information actively in the organizational system.

Overall, transforming the creation of explicit metadata for resources from an isolated, professional activity into a shared, communicative activity by users is an important development that should be explored and considered for future systems development."

Content managers, information architects, managers and librarians, here is my synthesys-edit of this must-read paper:

The creation of metadata has generally been approached in two ways:

a) professional creation and

b) author creation.

a) Professional creation

In libraries and other organizations, creating metadata, primarily in the form of catalog records, has traditionally been the domain of dedicated professionals working with complex, detailed rule sets and vocabularies.

The primary problem with this approach is scalability and its impracticality for the vast amounts of content being produced and used, especially on the World Wide Web.

The apparatus and tools built around professional cataloging systems are generally too complicated for anyone without specialized training and knowledge.

b) Author creation

The second approach is for metadata to be created by authors.

The movement towards creator described documents was heralded by SGML, the WWW, and the Dublin Core Metadata Initiative.

There are problems with this approach as well - often due to inadequate or inaccurate description, or outright deception.

c) User-created metadata

But there is also a third, possible approach: user-created metadata, where users of the documents and media create metadata for their own individual use that is also shared throughout a community.

User-created metadata is a third approach, coming to be known also as grassroots community classification of digital assets. (henceforth referred to as "Delicious") is a tool to organize web pages. A description online states it is:
"a social bookmarks manager. It allows you to easily add sites you like to your personal collection of links, to categorize those sites with keywords, and to share your collection not only between your own browsers and machines, but also with others".

The library and information science field has developed elaborate rules and schemes for cataloging, categorization and classification that include classification schemes such as the Dewey Decimal System and Library of Congress Classification Scheme, as well as large controlled vocabularies of terms for describing the subject of materials, such as the Library of Congress Subject Headings.

While professionally-created metadata are often considered of high quality, it is costly in terms of time and effort to produce.

This makes it very difficult to scale and keep up with the vast amounts of new content being produced, especially in new mediums like the World Wide Web.

Author-created metadata may help with the scalability problems in comparison to professional metadata, but both approaches share a basic problem: the intended and unintended eventual users of the information are disconnected from the process.

Two examples: Delicious and Flickr

Delicious is not unique or pioneering in its role as bookmarks manager. What seems to be relatively new and different is the emphasis on user added keywords as a fundamental organizational construct. These keywords, which are referred to as "tags" on the site, allow users to describe and organize content with any vocabulary they choose.

To use the system, you must first join by registering an account. The system is free to join and use. Only a username, full name, and password are required. The user then adds a specialized bookmark to their web browser. When browsing a web page they would like to add to delicious, they select the bookmark, and are presented with a form that has allows them to enter any tags they want to associate with the page, and then save it. These tags are optional; users can and do use the site without tagging their documents.

In addition to automatically generated chronological ordering of bookmarks saved to the system, the tags are used to collocate bookmarks within a user's collection. Additionally, these tags are also used to collocate bookmarks across the entire system, so for example, looking at the page will show all bookmarks that are tagged with "linux" by any user.

Flickr, a photo management and sharing web application, has a similar system of free-form tagging for photos that was adopted and modeled after Delicious.

It too requires users to create a user account, and is free to join. There is also the option to pay for an account with more features, like more storage space for photographs. Flickr offers a similar bookmark to add photographs to the system, but also has a number of other options to upload photographs to the system through web pages and software applications. Tags can be added at the time of upload, or later in the process when the photographs are displayed by the system.

Classification vs Categorization vs Folksonomy

The organic system of organization developing in Delicious and Flickr was called a "folksonomy" by Thomas Vander Wal in a discussion on an information architecture mailing list. It is a combination of "folk" and "taxonomy."

An important aspect of a folksonomy is that is comprised of terms in a flat namespace: that is, there is no hierarchy, and no directly specified parent-child or sibling relationships between these terms.

Overall, although the term "classification" is often used in relation to these systems, and has been used in this paper what is going on is more like "categorization."

Categorization is generally less rigorous and boundaries are less clear.

It is based more on a synthesis of similarity than a systematic arrangement of materials. Most importantly, each document can have many terms associated with it.

By contrast, classification schemes generally focus on providing a single classification to an item, and are very hierarchical and have clear relations. In a folksonomy the set of terms is a flat namespace: there are no clearly defined relations between the terms in the vocabulary.


The problems inherent in an uncontrolled vocabulary lead to a number of limitations and weaknesses in folksonomies. Ambiguity of the tags can emerge as users apply the same tag in different ways. At the opposite end of the spectrum, the lack of synonym control can lead to different tags being used for the same concept, precluding collocation.

As an uncontrolled vocabulary that is shared across an entire system, the terms in a folksonomy have inherent ambiguity as different users apply terms to documents in different ways. There are no explicit systematic guidelines and no scope notes.

Spaces, Multiple Words
Both Delicious and Flickr seem designed primarily to deal with single words. Delicious does not allow spaces in tag names, although Flickr does. In some instances, multiple words are used together in a single tag, without spaces, i.e., 'vertigovideostillsbbc' on Flickr. At times this can reflect users trying to put a hierarchy into a single tag, or simply reflects a category that has multiple terms, such as 'design/css' on Delicious. ( Both systems ignore letter case, which may collapse distinct ideas into a single tag, especially with acronyms.

There is no synonym control in the system. This leads to tags that seemingly have similar intended meanings, like "mac," "macintosh," and "apple" all being used to describe materials related to Apple Macintosh computers. Different word forms, plural and singular, are also often both present. In this particular situation with these Macintosh tags, the "related tags" sidebar of Delicious interlinks all three of these categories automatically. Plural vs. singular is often a problem, as seen in the popular tags on Flickr, both "flower" and "flowers" were listed.

These sorts of problems are the reasons why controlled vocabularies are used in many settings. Generally, any of the classic problems that controlled vocabularies help deal with will be present in these systems to varying degrees. However, it is likely that a controlled vocabulary would be impossible in the context of systems like Delicious and Flickr.


Although a folksonomy is not a controlled vocabulary, and certainly does have limitations, there are important strengths that are important to understanding the appeal and utility of such systems.

Browsing vs. Finding

The first is serendipity. While the controlled vocabulary issues discussed above may hamper findability, browsing the system and its interlinked related tag sets is wonderful for finding things unexpectedly in a general area. In researching this paper, exploring the bookmarks tagged with "folksonomy" on Delicious, there were many recent resources from a wide variety of authors and sites that I likely would never have been exposed to.

There is a fundamental difference in the activities of browsing to find interesting content, as opposed to direct searching to find relevant documents in a query.

It is similar to the difference between exploring a problem space to formulate questions, as opposed to actually looking for answers to specifically formulated questions.

Information seeking behavior varies based on context. While one could evaluate a folksonomy in a system like Delicious or Flickr by using specific queries from users, and then evaluating which documents tagged with keywords they choose are relevant to the query, that would ignore the broader set of browsing activities that the system seems to be stronger in.

Desire Lines

Perhaps the most important strength of a folksonomy is that it directly reflects the vocabulary of users. In an information retrieval system, there are at least two, and possibly many more vocabularies present. These could include that of the user of the system, the designer of the system, the author of the material, the creators of the classification scheme; translating between these vocabularies is often a difficult and defining issue in information systems.

As discussed earlier, a folksonomy represents a fundamental shift in that it is derived not from professionals or content creators, but from the users of information and documents. In this way, it directly reflects their choices in diction, terminology, and precision.

Desire lines are the foot-worn paths that sometimes appear in a landscape over time. Merholz notes, "A smart landscape designer will let wanderers create paths through use, and then pave the emerging walkways, ensuring optimal utility."

Ethnoclassification systems can similarly 'emerge.' Once you have a preliminary system in place, you can use the most common tags to develop a controlled vocabulary that truly speaks the users' language."

Merholz recommends using a folksonomy as the start of professionally designed controlled vocabularies.

While this may not be practical or desirable in many situations, the fundamental point is that the vocabulary of users may simply be too different from the other parties to adequately "pave the paths" in advance. Another important point may be that the terms users want to use move too quickly, or are qualitatively different than authors or systems designers.

Why Folksonomies Work

It is difficult to define a metric by which one could argue folksonomies are a success or failure, but the degree that it does seem to be effective in these systems as a way or organizing information, and that a large group of people are using these systems, are due to a few important factors including:

  • The overall costs for users of the system in terms of time and effort are far lower than systems that rely on complex hierarchal classification and categorization schemes.
  • In addition to this structural difference, the context of the use in these systems is not just one of personal organization, but of communication and sharing.
  • The near instant feedback in these systems leads to a communicative nature of tag use.
  • The conceptual shift from professional, designed, clearly defined categorization and classification schemes to an ad-hoc set of keywords enables users -- not just professionals -- without any training or previous knowledge to participate in the system immediately.
  • Additionally, participating is far easier in terms of time, effort and cognitive costs.
  • A folksonomy lowers the barriers to cooperation. Groups of users do not have to agree on a hierarchy of tags or detailed taxonomy, they only need to agree, in a general sense, on the "meaning" of a tag enough to label similar material with terms for there to be cooperation and shared value. Although this may require a change in vocabulary for some users, it is never forced, and as Udell discussed, the tight feedback loop provides incentives for this cooperation.
  • Finally, there is the compulsion to share in general that underlies these systems. The very act of user self-selecting what to tag is important: this is not just material that users want to find themselves later, but also material they are sharing with others. Both systems have an explicit kind of social networking component built-in: Flickr allows you to specify other users as contacts, friends, or family and see views of just their material; Delicious allows you to "subscribe" to other users lists.
  • These two models, community and individual motivations, are not mutually exclusive, and it is likely both are necessary to explain a folksonomy in the context of these services.

Full original essay
Folksonomies - Cooperative Classification and Communication Through Shared Metadata
Adam Mathes
Computer Mediated Communication - LIS590CMC
Graduate School of Library and Information Science
University of Illinois Urbana-Champaign

Adam Mathes - [ Read more ]
Readers' Comments    
2005-01-19 22:55:07


Very nice summation. I just wanted to point out Feedmarker (, a feed aggregator and bookmarks manager that uses open-tagging/folksonomy.

posted by Robin Good on Wednesday, January 5 2005, updated on Tuesday, May 5 2015

Search this site for more with 








    Curated by

    New media explorer
    Communication designer


    POP Newsletter

    Robin Good's Newsletter for Professional Online Publishers  



    Real Time Web Analytics