All Tags were not created equal

Posted by direwolff on November 26, 2005

  Recently, I had a discussion with someone from one of the major RSS/blog search engines about tags.  Specifically, I wanted to explore with this person, the role of automated tagging technology and what role that would play in tagging content on the Web.  He became quite adamant that there was no role for automated tagging solutions.  He spoke with words like “folksonomy” and explained that as language changes only people could capture the true nuances of such in their tags.

For part of this conversation, my eyes glazed over.  In part, this was because a story about baseball 20 years ago, is still a story about baseball today, and will be so in 10 years from now.  Sure, language evolves, but in general, it does not do so and definitely not at a pace that negates all past meanings of the ideas and concepts once conveyed.  However, this did force me to think through the role of human versus machine tagging, in order to be clear on why these were each important in their own right.

During our conversation we talked about Riya, an image search technology that will actually go through pictures, recognize faces and words in the pictures, and tag content accordingly.  He lauded their effort as a very good tagging technology because it was able to do the mundane job of tagging pictures.  Of course, the lack of difference between tagging pictures and tagging legacy text seemed lost on him.

In considering Riya, I realized that as an automated tagging solution, it was basically looking at a picture and obtaining implicit information from within the picture in order to determine the appropriate tag(s).  In other words, if there was a person in the picture that it had been trained to recognize, then it would identify them and produce a tag of their name, but it would have no way of knowing that this picture was taken during my Christmas party or in New York or any number of things that are not implicit in the content it’s reviewing.

This is also what text automated tagging solutions are doing, they’re determining the appropriate tags from implicit information found in the content.  This reduces the need for human beings to focus their attention on applying implicit content tags.  Instead, it puts humans in position to focus their tagging on the explicit information, that which is not easily determined from the content, be it pictures or text.  For example, recently at the Web 2.0 Conference, bloggers were asked to tag their postings or pictures “web20”.  This would make the related content more easily discoverable by any one wanting to keep up with the conference.  This is the idea behind folksonomy.  It’s not something that would have been easy to determine implicitly from the posted content (unless, in the case of text, the author used the words “Web 2.0” in the posting), and hence not something an automated tagging solution could handle well.  However, if the Web 2.0 post was about search engines or “mashups”, this information could have been implicitly deduced by an automated tagging system and tagged accordingly.

While there’s still much work to be done in the area of automated tagging solutions, the ability for these technologies to play a role in tagging content remains very useful and even desirable so that content can be found in as many appropriate ways as possible, even when not yet tagged by a person.  One could argue that in effect it’s what Google does below the covers of their search engine and what Yahoo! has made explicit through their keyword tool (lets you determine which keywords will be used to match against ads for any Web page).

As these two modes of tagging come together, it will be interesting to see how much better search will become over time since humans tagging content will play such an important role in adding non-explicit context to information on the Internet.


