“Got kitesurfing on the mind, mixed with some search & classification tech, and a dab of political ranting”

Archive for January, 2007

We have a new Winner in the Insanity Department

Posted by direwolff on January 30, 2007

Well, first I put up the video on Big Summit Speed Riding which I thought was pretty sick. Then I ran into Jet-man, which frankly, just took my breath away. I showed an interlude of snowkiting, which is the only one of these adrenalin sports that I’m willing to undertake at this stage of my life. Well, hold the presses, these all (even Jet-man I fear) can only be considered sissy sports when matched to the top of the food chain of sick, insane, and all around balls-to-the-wall sport of “human gliding” (best name I can come up with as you’ll see).

Click here if video no longer accessible from YouTube.


I don’t even know how someone gets their head around trying to figure out that this is possible, much less trying it. Note that I don’t see much room for failure. Do you?


Posted in Kitesurfing & Extreme Sports | Leave a Comment » on Convera’s Tail

Posted by direwolff on January 30, 2007

Like it wasn’t enough that Google was beginning to encroach on its new turf, but now it appears that Convera‘s business is about to get a little bit more competitive given Reed Business Information’s entry into the business search engine space with it’s new offering There’s a good piece about what Reed is doing in Folio Magazine. Picked this up off John Battelle’s Searchblog article titled “Reed to Launch B2B Vertical Search Portal“.

I spent a little time on the site and found that while it still relied a little too much on keyword searches, they had done a good job incorporating categorization of the content (in a search on “computer market statistics”, the fourth result referred to “Embedded System Market Statistics”) in order to derive more relevant results. It’s still missing the meaning component, but this is better than the status quo.  What also struck me however, is that one of the results took me to another Reed Business Information site called EDN (Electronic Design News) where they had syndicated the search box but focused on just electronics reference searching. In effect, given the tools in use and Reed’s call to other publishers to get involved with them and make their content accessible, what would stop them from essentially jumping into the vertical search engine service business?  Certainly nothing I can think of. They’ve already set-up Google AdSense to monetize the pages, so it seems like they may be just a bit more turn-key than Convera, where the publisher has to determine their own monetization scheme.

After several queries, I’d say that the results generated here were far superior to those I saw when evaluating the Northern Light public business search engine.

Tags: , , , ,

Posted in search & categorization | 1 Comment »

Too powerful of a story not to blog a reference to it

Posted by direwolff on January 27, 2007

While we may like to believe that the issues of race have evolved in a positive direction over the past 50 years, a high school student’s 8 minute documentary seems to prove otherwise.  It’s very powerful and moving.  The following is a link to a short news piece about the documentary:

Posted in Feelings | Leave a Comment »

Oh-oh NSA Data Miners in Demand

Posted by direwolff on January 25, 2007

Just caught this link courtesy of Bruce Schneier’s blog post titled “NSA Hiring Data Miners“.  I’ve railed about this before, and I’ll rail about it again, predictive or user analysis systems are nice for commerical uses where they can afford to get things wrong without adversely affecting the rest of any person’s life, but in government systems this is pure lunacy.  If a contextual or behavioral or collaborative filtering system recommends the wrong thing or groups me inappropriately I may have to deal with some mistargeted ads or other slight inconveniences online.  But if a government system marks me as a terrorist or a danger to society as a result of being off by one action I may have taken at some point in my life totally unrelated to who I am or ever was, then that’s a HUGE problem that they’ve never been prepared to fix.

Companies like Autonomy use in part Bayesian statistical models to determine the categorization of various documents.  Basically, they get the users of the system to feed it anywhere from 10 to 20 samples of documents that should belong to a particular category to train the system, and from there *magic*, the system is able to identify new documents that should be categorized accordingly.  Nice in principle, but in practice such systems struggle reaching a 70% accuracy rate.  The problem gets worse because no human can see how or why a new document failed to get categorized appropriately.  The only way to deal with it is to submit the new document to train the system that it should recognize documents like this new one as being part of that category.  I’m only using this as an example of some of the statistical methods being employed for various tasks including some of the ones that some of our government intelligence departments are using for some of their citizen/terrorist data mining activities.

While I’m not saying that there aren’t some very good advancements in these techniques,  the NSA wiretapping fiasco, the recently departed “No-Fly List”, and a host of other invasive programs from the NSA, the TSA, the CIA, and the Department of Defense (on its own citizens), puts us in the awkward position of not being able to trust our own government to deploy something here with the best interest of its citizens in mind.

Posted in search & categorization, Security/Privacy | 1 Comment »

Great Big Corporate Search Engine in the Sky

Posted by direwolff on January 23, 2007

OK, a long winding post about some half baked thoughts I’ve been having. There’s a ring of impossibility to it. I guess that’s what makes it fun to think about.
There’s this really interesting discussion taking place on Slashdot on “The Need For A Tagging Standard“. While I’d hardly call myself qualified to engage in a technical standards discussion, because of my involvement with several companies doing innovative work in the RSS aggregation and search spaces it’s something that I’ve had to do a fair bit of thinking about. It’s my opinion right now, that standardized tagging, much like the initiatives around topic maps in the Semantic Web, would only be an evolutionary change to the idea of normalized databases accessed between partners in a supply chain, or what companies like QRS (which developed products for global data synchronization and is now part of JDA Software) were providing to their customers. This need for data synchronisation is also visible in the efforts to standardize microformats. It’s all about how to make it easy for applications to determine what a piece of content is. Is it a restaurant recommendation, or a purchase order, or a product listing, or event information, or a legal notice? What is this document?… so that the application can (a) know whether it’s appropriate for its uses, (b) find the necessary data within the document to accomplish its task.

We see this manifest itself again with market intelligence applications that are trying to differentiate between commerical RSS feeds from mainstream media providers versus blogs from independent people versus professional blogs (ie. Seeking Alpha in finance). The problem of determining what the content represents, is multi-dimensional. Yikes.

It gets even nuttier when you figure that different companies may want the same information to apply in different ways. For example, they might want a customer name as well as contact name for their accounting and CRM systems, whereas another company is looking for a company name and a customer name both pairs effectively meaning the same thing…and wouldn’t you know it, both companies want to communicate this information between each other for other applications. Double yikes!!!

First a brief anecdote here. During the time of my first venture, a software development consultantcy, from 1987 through 2000, our expertise was in database applications. We supported development in Oracle, Sybase, dBaseIII, R:base, Foxbase, Clipper, and Informix. I remember a client asking us what we knew about warehouse and shipping operations to determine if we were qualified to develop the custom application they were seeking. We candidly responded that we actually knew nothing about warehouse and shipping operations, but that from a systems perspective, everything gets normalized to data flows and our analysis phase would bear those out. As it related to data flows, we knew a lot (hence our company name, DataWorks ;), and we felt comfortable that we could develop any system that the client needed. Sure enough, the inventory & shipping system we developed was for L’Oreal’s main distribution warehouse at the time located in New Jersey. The system could route packages to UPS versus U.S. Mail and interfaced to scanners similar to how this is now done for luggage routing. Indeed, it was a data flow issue, nothing more. The applications drove what happened to the data.

My point here is that applications were built on databases. The databases facilitated the logically structured storage of the data, and applications ran above these deciding what to store, what to look-up, what to change, what to remove and when to perform these activities. I now believe that search engines have emerged as the new application platform (as I have previously mentioned here), the new storage facility with much less structure because complexity requires a looser organization, but it also requires a more granular identification of the stored items. This identification, unlike the days of databases, where normalization was applied equally across the stored data, is now much more flexible and dynamic enabling us to have documents that are totally unrelated in format and otherwise, but all be stored in the same vessel, the search engine. By indexing documents, the decision has been made that words (minus stop words, ie. “a”, “the”, “he”, etc…) are what needs to be identified in those documents (though we also see that this is somewhat short sighted since keyword search sucks).

Other systems like word processors decided that formatting elements needed to be identified as well. Today MS-Word goes even further and is able to identify dates, addresses, letter formats, etc. When we start seeing how documents and spreadsheets can be moved between applications, we start to see location of elements being identified as well. HTML, much like Word, facilitated presentation identification. XML starts to go further by enabling users and systems to further identify entities within documents.

So perhaps it’s a naive perspective I’m taking in saying that categorization and standardization of tags or topic maps or what have you, is all about trying to avoid the application having to figure out what content it’s looking at by grouping the content into rigid structures. Now as I watch the efforts going on in personalization, I’m noticing something interesting. Specifically that what might be a query result for me, may not be one for you. In other words, a movie is not a movie is not a movie unless all of us do the exact same things and use the exact same services at the exact same times, online. A personalized search regarding a movie that is recommended to me on Amazon may not be recommended to you. As well, it’s quite possible that I don’t even get a movie recommendation that you do, especially since Amazon sells a lot of products and don’t have to restrict themselves to just movie recommendations. What is important is that they’re enabling a personal view of their content. The data is presented for my needs. My cookie or login to Amazon determines the behavior and the information that I will see.

Well, if it’s good enough for personalized content, why can’t that metaphor be extended to applications where these would make customized requests and be presented with that information? Two companies might request the same data elements from a search engine but name these differently (ie. “customer name”/”company name”). What’s important is having the tools for any application to be able to make the request for the specific data element it needs regardless of what it’s called in the respective application.

This starts to talk to the idea of creating a big search engine for business consumption and use only. One were a set of markers helps provide the identification underpinings for simple or complex requests to be addressed precisely. Because of the importance of the markers and their potential effect on applications, control of these (in terms of creation or removal) would remain with the search engine provider. But the ability to identify more sophisticated ideas, specific data elements, or document categories, would remain with the application developers. These would effectively be queries against the content corpus. Already we know that companies often like to creat their own taxonomies for managing information in an effective way for the business they’re in. Why not allow them to apply this to any content they want to interact with without forcing them to build up another search engine this content.

What I’m really starting to find myself talking about is similar to entity extraction, but going further and providing application tools for defining and locating entities that as simple as names, countries, companies and so on, but as complex as trends & forecasts, controversies, and government spending. Be able to define access to content discussing the “bank of a river” and treat this as different from a discussion of “plans for the new Wells Fargo bank being built by the river”. Those of you who pay attention to the search engine world may recognize this as one of the claims being made by the eagerly anticipated (some time in Q4 of 2007) Powerset. The subtle difference being that Powerset seems focused on addressing this as a natural language query issue for humans seeking information, and what I’m describing is a solution for application developers who are not likely going to write code to turn their request into grammatically formulated English queries. More importantly, it’s about allowing applications to interact with raw content and apply their own (personalized) perspectives, to derive information. This is more than simply the search engine model of spidering items at regular intervals, but it has to have inclusive an RSS search engine that is also keeping up with real-time information releases. All content is at once relevant and irrelevant to this system, as it’s the applications that access it that make the final judgment.

There would be fees associated with access to this search engine which most likely be through APIs. Even desktop or Web apps would utilize the APIs. This search engine could have a distributed architecture to address scalability, but the logial representation would be that of a single search engine containing as much data as Google or Yahoo! or MSN, except that its accesses would be programmatic versus human user interactions. Because usage would be metered, that would impose some inherent controls to prevent the most onerous abuse. Spam, like all other content would still need to be identified though not necessarily removed since it may be useful to some applications.

The access based fee structure a la Amazon’s EC2 and S3 would work because this sort of the premium any company would have to pay to conduct this sort of endeavor for their own proprietary uses. Effectively, it would be cheaper for companies to interact with this system for their content needs than to develop their own large scale search engine to meet their continuously evolving data needs. When I look around today at the number of search engine applications that effectively have the same data and just sort the results in novel ways, or apply proprietary applications to these, I realize that the real need is to simply gain access to information based on what the applications’ perspective requires. Some may needed information sorted by most viewed, others by date, and others might need to locate much more precise information. All of these differences, in my opinion, can be addressed at the application level.

Now who’s gonna build this thing? More importantly, did I just describe a search engine utility?

Posted in search & categorization | Leave a Comment »

“Learn something new every day”

Posted by direwolff on January 23, 2007

So here I thought I had a good handle on the history of snowboarding, especially given that I’ve been enjoying the sport since 1990 and had tried one of the early Burton boards that a friend of mine brought to Central Park in New York after a big snow storm in 1985… “well not so fast there rock star”. Toots, a good friend from back east sent me a link to a 2 minute history of the sport which really lays it all out. Great old school footage. Check it out…



Tags: , , , , , , , , , , , ,

Posted in Kitesurfing & Extreme Sports | Leave a Comment »

Two weeks ’til my snowkiting weekend

Posted by direwolff on January 15, 2007

Boy have I been jonesing for some kitesurfing. Tried to go out a couple of weeks ago, but with the temperatures dipping in the 40 degree range, the cold water and the windchill factor playing a role, the thought of getting into a wetsuit and into the drink was more than one of my good riding buddies and I wanted to endure.

Well, now I’m on pins and needles in anticipation of what should be a great trip to Skyline Ridge, Utah for some awesome snowkiting. Jeff Kafka, a well known local Bay Area kitesurfer & surfer, turns his attention to snowkiting in the winter season and his school, Wind Over Water, focuses on providing some easy to deal with packages for snowkiting lessons and good times. He’s been one of the early kitesurfing and now snowkiting pioneers, and put together great on-location accommodations, as well as other amenities to make it easy on those wanting to get started in this new sport to experience it with pleasure.

Lil’ Pinot and I are barely able to stay contained with the clock ticking towards our early departure in a couple of weeks for the mountains. Lil’ Pinot in particular is looking forward to trying her hand at snowkiting on skis given that she’s an exceedingly proficient skier already. For me, it will be on a snowboard and while I know to expect a different ride, the combination of my snowboarding and kitesurfing background prepares me well for this very cool new twist on a couple of familiar sports. Suffice it to say, we will definitely get some good photos to bring back and blog about.

Now let’s see if I can keep focusing on work so as to take my mind off this killer sport. If you want some idea of what it looks like, here’s a video that should satisfy your curiosity.

Yes, I know, it’s totally cool looking isn’t it? Now let’s get ready to SEND IT!!!

Tags: , , , ,

Posted in Kitesurfing & Extreme Sports | 2 Comments »

Will you iPhone?…iWon’t

Posted by direwolff on January 10, 2007

Well, it looks like Cisco has filed suit against Apple in no time flat upon their announcement of their like-named iPhone. Cisco apparently owns the trademark on iPhone, so it will be interesting to see how newly dubbed Apple, Inc. will fare in defending this intellectual property. Especially in light of Steve Jobs saying that he would vehemently protect the new iPhone’s IP. I wonder if he thought that the legal games would begin so soon?

After seeing Jobs’ keynote, I continue in my belief that he is a master showman second to none. There wasn’t a moment through the presentation where he would utter the words “isn’t that cool?”, that I mouth the words “totally Steve, totally”. But amidst this day dream of a presentation I started thinking of the interactions I have with my “crackberry” 8700, the things that I’ve not liked about the Palm Treo, and then went further to think about how much I truly trust Apple. It’s with all of this in mind that I guess I was jostled back into reality and came to the conclusion that the iPhone, sadly, isn’t for me.

On the knits side of things, I really don’t like screen keyboards. Something about the lack of tactile feedback that really makes them uncomfortable for me to use. The Palm Treo offers the option to use the screen but on any of the one’s I’ve tried (including my wife’s) it’s always easier to just use the keys below. As well, the iPhone’s didn’t seem thumb-able. Steve’s one finger typing just doesn’t cut it for me, even where I can see advantages for dialing while driving. Which brings me to the other point. Apple did this with the iPod too. If I can only remember a word or two from the song or artist I like, there’s no easy way for me to find that artist on an iPod without analysing the full list until I find them. Given that there’s no keyboard, I can’t search. Well, I have over 4000 contacts in my crackberry and over 1200 appointments that I sometimes need to search through. Didn’t seem like the iPhone was well suited for easy searches through this content, nor the music content.

Now, the device itself is a thing of beauty, but given how poorly Apple executed on its iPod Mini, with the easily scratched or broken screens, it’s tough to say that I’d trust them with this delicate and expensive device. How is it easily carried in one’s pocket with any sort of guarantee that it won’t snap in two given how thin the device is? As for the Bluetooth phone ear-piece, from Steve’s presentation he seemed to imply that the Apple ear-piece would be connect to the iPhone more smoothly than others…hmm, I wonder why if others are also using Bluetooth. I like my Motorola HS-850 headset and would want that operating smoothly too. Apple has a bad reputation for closed systems, and this would be a sad mistake in this case. Already the idea that this device only operates on Cingular I think was the wrong move, despite the fact that I enjoy being a Cingular customer.

The whole album and video representations and “coverflow” were sweet to see in motion, but Apple still remains a staunch user of DRM (digital rights management technology) and that simply no longer sits well with me. I’ve stopped buying music from iTunes no matter how convenient I found it in the past because I don’t like the idea of being locked into their world…no matter how cool the device operates.

There is one kick-ass feature that they included which I feel compelled to mention despite the fact that it alone won’t get me to buy the device, but visual voicemail was a genius move if for only its simplicity. It’s great to see that Apple did keep in mind some of the simple things that needed to be fixed with current systems and addressed them so nicely.

So all-in-all, despite my base desire to join the cult of Steve, I find myself unable to follow, but I do believe that Apple has done a lot with the iPhone to forever (once again) change the landscape of what it means to provide a communications device. And indeed it is amazing how they’re able to keep innovating into existing industries like this. However, I was surprised that during the initial part of Steve’s presentation, where he mentioned the product Apple had innovated, he started with the Macintosh, then the iPod, and now the iPhone, but somewhere in the mix the Apple II seems to have been forgotten. That too was an innovation during its time and I’ll never forget it.

Tags: , , , , , , ,

Posted in Technology | Leave a Comment »

“Stench in NYC traced to NJ, officials say” — Home News Tribune

Posted by direwolff on January 9, 2007

I just about fell out of my chair laughing this morning when I read this headline off my Google News page. While this may be mildly humorous to most (if at all), having grown-up in Manhattan, we always tended to have a low opinion of New Jersey. Mind you, with little justification as I’ve grown to appreciate many of New Jersey’s splendor. But because of the chemical plants that sit right on the other side of the tunnels and bridges (yes, New Jersey is considered “Bridge & Tunnel” ;) from New York, and those near the Philadelphia/Delaware border again on the New Jersey side, there’s always been a stench that I’ve associated with New Jersey. So you can imagine that seeing such a headline begs the question, “what took them so long to figure this one out?…heck, I’ve know about that since I was 16”, which was the first time I’d ever laid foot in New Jersey though I had already travelled to Europe, the Caribbean, Connecticut & Maine.

While attending college in Pittsburgh and driving home to New York on holidays, I was guaranteed a whif of these unpleasant odors right past Newark before crossing over to the city. I recall several trips with friends where we all looked at each other to see if any one was going to claim the fart. Well, looks like my senses have been vindicated and now it’s made the headlines.

Tags: , ,

Posted in Just Fun | 1 Comment »

If you want to stay sane, don’t read…

Posted by direwolff on January 5, 2007

…the last three posts from Bruce Schneier’s “Schneier on Security” blog. Here are the 3 posts that you shouldn’t read:

Remember, reading these posts will lead to severe depression and the feeling of hopelessness that things will ever get better in this country. Reading these could also lead to a belief that while our government structure has changed after the last election, so much damage has been done and ignorance is at such a high peak and so highly regarded, that it may be difficult to reverse this trend to one where education is seen to be important and sanity is restored. Follow those links at your own peril.

Tags: , , ,

Posted in Feelings, Public Policy, Security/Privacy | Leave a Comment »