SEND IT!!!

“Got kitesurfing on the mind, mixed with some search & classification tech, and a dab of political ranting”

About Sphere, what am I missing?

Posted by direwolff on November 10, 2006

After reading commenting on the Sphere news a few days ago, I had the opportunity to install the “Sphere It!” button on my browser under the presumption that Sphere‘s algorithms would unearth related content better than anything else I had used before. This is a promise that keeps coming up, from search engines offering relatedness or selling their intuitive sense of what people are looking for, through their years of research and the intelligence and experience of their founders. Well, I’m sorry to say, this one falls by the way side like all of the others, though their marketing has served them well given the types of deals they’ve been doing as of late (ie. Time.com, About.com, Marketwatch, etc.)

I ‘m still trying to figure out what Om Malik saw in them in October of 2005 to make him write such a favorable review, but here’s my latest experience. The article I was reading was titled “AOL Acquires Relegance Corporation“. Once done reading this, I clicked on the “Sphere It” bookmarklet on my Firefox browser, and you can see the results I got here. You be the judge. I’m happy to rationalize that any news about AOL is game, but look through these results. I previously blogged that “Without Context there is no Meaning“, that goes for Sphere too. Not one story on this list was even remotely interesting or related to what I was reading in a meaningful way to me, and afterall, isn’t this technology for people? Having said that, they seem like a decent RSS search engine along the lines of a Technorati or Feedster, but as it relates to anything deeper, I’m not impressed.

In looking around the blogosphere for more discussion on Sphere (using Google of course ;-), I ran into a post by James Gross on May 23rd, 2006, that had a very interesting response in the comments section from Tony Conrad, Sphere’s CEO, in response to some comments from Scott Rafer. Below is Tony’s response as I’d like to focus on some of the things he says here in relation to the issues I’ve raised above:

Now Scott – that’s a pointed comment )

The Sphere It! bookmarklet is a very robust piece of technology, developed
over several years by my two cofounders, Martin Remy and Steve Nieker.

Sphere It takes advantage of a novel text analysis technology that
analyzes entire document texts (a news article or blog post in Sphere
terms, but any text will do; the technology works just as well on
product descriptions, Word documents, etc.). Each text gets passed
through a proprietary pipeline that extracts key concepts and themes.
These key concepts and themes are encoded in a data structure we call
a Document Genome.

As the name is meant to suggest, a Document Genome is unique to the
text from which it was derived. It’s worth emphasizing at this point
that DGs are not simply keyword extractions. There’s a complex
analysis pipeline employing a number of both traditional and novel
text-analysis routines to identify the concepts and themes. The
tokens of the resulting Document Genome, in contrast to keywords, are
machine readable abstractions that don’t mean much to the human eye.

Like the biological data sets they’re named after, Document Genomes
can be compared for similarity. (I’m 98% chimp; 63% iguana!) We
generate Document Genomes for every blog post we crawl. When you’re
looking at a page on the Web and click the Sphere It bookmarklet,
Sphere grabs the text of the page you’re viewing, generates a
Document Genome for it on the fly, and compares it to the DGs of the
blog posts we’ve crawled. The closest matching blog posts are presented.

In practice, the Sphere It approach to contextual matching is
fundamentally different from other approaches out there. If you
compare it to other text-based approaches, like Google’s Similar
Pages or Yahoo’s Search Related Info, we’re consistently much more
precise in our matches.

Compared to other blog-space approaches, like Technorati This, Sphere
It! gives a wider breadth of results on the topic. Technorati This!
limits results to only those blog posts that link directly to the
page you’re viewing. How do you find the posts that were published
prior to the page you’re reading, but on the same subject? How do you
find posts discussing similar topics from other sources (you’re
reading WSJ, they’re reading NYT)? How do you find posts discussing
the topic but not linking anywhere? Sphere It makes those
connections. (I’ve covered this in more detail at http://
sphere.wordpress.com/2006/05/12/week-one-in-the-rearview-mirror/)

Comment by Tony Conrad — May 25, 2006 @ 10:44 am

It certainly appears that Sphere is grounded in what seems to be sound technological advances (Document Genome at least sounds authoritative), but the reality is that the proof is in the pudding. Sure, Technorati and Yahoo! may be worse, but I’d say they’re probably all bad and not worth mentioning since none shows results that I find to be related to the content from which my investigation began. What seems to be missed in all of this is that every one is still fighting the relatedness question without considering that it all starts with context. Perhaps this is why services like Clusty are actually pretty good (though it doesn’t get the recognition it deserves probably because it remains centered on keyword matching), since they provide an interface to help the user determine context of the query. For example, when I simply cut & paste the title of the AOL/Relegance story into the search box of Clusty, I get a set of results, but on the margin I also get the following categories of relatedness:

Now that’s at least providing some direction for more related content, not arbitrarily deciding that some story of modern hair replacement technology (titled “Landfill Technology” towards the bottom of the Sphere first page results from above), is relevant. Oh yeah, did I mention that link is to a splog…doh! If you check out the above categories, you should see very related results, with little ambiguity that they are related to my query. Let’s get a Clusty bookmarklet going, now that could truly be useful.

I will say this about Sphere, like Powerset, it does appear that they have good investors and a smart team, so let’s hope that this means that they may have a product development path that will demonstrate the anticipated glory that seems to be missing from their deliverables to date.

Tags: , , , ,

Advertisements

2 Responses to “About Sphere, what am I missing?”

  1. Martin said

    Great post. Your point about context is spot-on, though I personally don’t think Clusty is a good showcase for it (it falls down on recall, which is an important ingredient in context). I looked into your example a bit and found out that we weren’t extracting a good Document Genome from the article you used because of the pathological HTML markup in that story (line breaks in divs and no quotes around HTML attributes — yuck). I fixed our handling of the markup and now get results that are very on-topic to the AOL/Relegence deal results.

    So,

    1) Thanks for giving SphereIt a try, I hope you’ll keep it in your browser.

    2) Yes, we should’ve been handling the markup better in the 1st place and it would be nice if our results were 100% spam-free (they’re mostly spam-free), but hey, no one’s perfect. We’re working on that, we got a little closer just this morning while debugging the problem you found :) If all of us working on a better search/discovery experience stayed in the dark until we’re perfect, you’d have nothing to review in the mean time :-)

    Regards,

  2. p-air said

    Well done on fixing the offending mark-up Martin, as the results are indeed much better. Now that makes Sphere a customer service focused start-up if I’ve ever seen one.

    As for your comment:

    “If all of us working on a better search/discovery experience stayed in the dark until we’re perfect, you’d have nothing to review in the mean time :-)”

    You’re absolutely right and getting out there as soon as possible to get people playing and hopefully finding the gaps that need to be filled is critical, so I commend you and your team for having the balls to play in the open and iterate your service. I guess the only part that frequently turns me negative is amount of hype or buzz on things that are not yet ready for prime time, but having said this, it certainly appears that Sphere is trying to keep up w/any issues that are raised in a timely manner, so keep up the good work.

    And yes, I will keep the bookmarklet in my browser and continue experimenting w/it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: