A simplistic discussion on TAGS and Context
Posted by direwolff on February 2, 2005
So after hearing about and following the posts about what Technorati is doing with tags, hearing our developers talk about the importance of tags and of tagging all components of the content on Tribe, I started thinking about Google's position in all of this. Even more, I started thinking about what's missing in all of this, and then decided to share these thoughts. Because my thoughts can frequently be scatterred I'm going to do my best to contain this into something worth consideration.
First, about Google. They are in an interesting position as it relates to tags. Today, most end-user searches on Google only have one or two words. Hence, Google has the best idea of the relationship between a specific word searched for, and what people click on as the most related content. While I haven't noticed any redirects, where Google might track this stat, I'm told that this has indeed been done though somewhat under the covers (but I can't vouch for this). Actually, it probably wouldn't take too many days of doing this before Google had a big enough collection that associated words with documents/Web pages.
With AdSense, Google went the other way. They use their system to assign keywords to unstructured content (documents/Web pages). Hence, when I see a Web page, the AdSense ads that appear are based on the keywords that were matched to the document by Google (probably using the tech they acquired from Applied Semantics). Now, in looking at the quality of matches for these AdSense ads, I noticed that while some times the keywords were indeed found in the content, the ideas weren't generally captured by those keywords and so the resultant ads that appeared were inappropriate or poorly matched.
Enter tags. When Web page/document creators begin tagging their content, they will use the words that they feel are most appropriate to the ideas they are conveying, even if these words do not appear in the content itself. Hence, AdSense could get a whole lot better and more relevant by the simple application of tags by content creators. I'd almost argue that the clickthrough percentages that AdSense is seeing today are almost as accidental as doing no targeting at all, since only a very small subset of the ads served are relevant in a meaningful way beyond a word match in the content. These will certainly get better as Google begins leveraging tags.
So what's missing? Well, words taken on their own, will always be poor matches for ideas. The main reason is because they lack context. Not context as in word match the way Google, Kanoodle, AdBrite, Alexa Quicklinks, and host of other players would have you believe, but context as in talking about the same subject or idea. None of the tagging initiatives address context, and though this is what the Semantic Web is trying to address in part, reaching agreement on taxonomies has been a less than manageable task. What's really needed is another layer of technology that enables people to easily express context in a meaningful way so that the value of the words becomes readily apparent for any application that they become relevant to, not just where a keyword match occurs. It's the tools that make it easy to define, change, and manage the context of requests that will be critical in all of this. Words alone are only partly helping people reach the right content. Context needs to be added to get us the rest of the way there, and tools that deal with this have to begin getting more air play if tags are to truly fulfill their promise.
One other consideration is that words will take on different connotations depending on the context that's applied to them, so being able to handle the viewing of content under various contexts will also be important.
Stay tuned as there may be a company out there already able to fill this need.