OK, so I’m beginning to make some headway on a concept that brings together the benefits of a Wikipedia-like contributory model with the needs of users trying identify and locate very specific information from large collections of documents. Whether these collections are Web pages, RSS feeds, or documents on users’ hard drives, we want to enable users needing to do research or discovery, the ability to do so quickly and effectively. The idea is that creating robust queries requires tools that are not generally available to most people. Classification and text analysis technologies are expensive and search engines’ advanced querying capabilities remain relatively weak. Lexis-Nexis level querying is out of reach for most users, and corporate users that have access to this pay a lot of money but are also bound by the content walls provided through such services.
The idea of being able to let any one develop a robust and sophisticated query that can then be shared with others is the rooting of this new service. The ability to not only create and use this query, but to also index the content according to this and all other queries created, in a fast and scalable way moves us to an interesting place where communities of interest can work together to share access to useful information. Of course, there will also be the ability to keep queries private, especially at higher levels of details where specific names of entities or people come into play, but there will be a set of foundational queries which will contribute to human knowledge, that any one will be able to participate in creating. We already have over 500 such queries that locate such nebulous ideas as any discussion about a trend or forecast, or all discussions about terrorists (and not necessarily because the term “terrorist” appears in the article). Relevant domain focused content identification is now more easily achievable.
This community of knowledge is shifting the focus from knowledge creation by virtue of originating the content to knowledge creation by virtue of providing the roadmap for finding information, by providing people the tools for doing so. This is somewhat analogous to folksonomy, where tags are used to identify content. Both the author and those who find the content can tag it using different services. Where tags tend to be weak in their consistency of usage given that it’s difficult to know the motives of the person tagging the content, knowledge types such as topics, issues and categories will be strong in this regard. Even if inaccurate, they will be consistent which means that improving their accuracy will reverberate across all usages.
The community of knowledge will be primarily useful to those trying to discover content. Looking for article, blog posts, or research reports that talk about an increase or decrease in the price of oil is a fairly abstract idea to look for, but that’s just the kind of thing that will be possible and we will be providing access to the tools to enact such discovery and the wiki to share the topics, issues, and categories (queries) created by the community for shared use. Imagine trying to track the behavior of those in charge, and what the complexity of such a query might need to be. The community of knowledge will have a set of foundational knowledge types that already deal with such complexity, but also allow users to tackle more if they have a need to.
As we elaborate the platform, I’ll discuss it further here, but note that our intent is on providing a resource that will break through the constraints currently existing in precise and accurate text analysis technology, so that all can have access to it, not just the elite Fortune 500 companies that can afford starting prices of $200,000+ in annual fees.
Tags: community of knowledge, text analysis, classification, tags, topics, discovery