“Got kitesurfing on the mind, mixed with some search & classification tech, and a dab of political ranting”

Archive for the ‘Intellectual Property’ Category

Google Book Search Legal Challenge and Its Externalities

Posted by direwolff on March 19, 2007

Back in late 2005, several book publishing stake holders decided to sue Google regarding the company’s Print Library Project. While at first I thought of this as just another old industry resisting change, it soon began occurring to me that much more was at stake here and worth further review. The implications become important as well, when we begin discussing other seemingly unrelated issues that are raised by entities like the AttentionTrust about who owns users’ clickstream exhaust and more recently issues raised by several news publishers in Belgium.

So first let me start with the idea that search engines have been spidering the Web pretty much since the mid-90s, as far as I can remember. Companies like Excite, Lycos Alta Vista, Inktomi, Google, etc. John Battelle’s book, The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, provides a good account of all of this. The role of spiders is to go out on the Net, and bring back the content on Web pages (though in some cases they also pick up content from Office documents and PDF files) and index these in their search engines. In some cases, all that the search engine saves is the indexed information, in others they actually keep an archived version of the page for future reference (I may be wrong on the specifics here, but that’s the general idea). However, from a search perspective, all of the search engines keep within the ‘fair use’ doctrine and only display a sentence or two where the search result occurred and provide a link to the actual content for access to the complete text. By adhering to ‘fair use’, this in effect keeps search engines on the right side of copyright law. As well, from the Web site publishers’ perspectives, because people use the search engines to find information and as a result so much of their visitor traffic comes from the search engines, it’s a mutually beneficial arrangement. My key point here however, is that although the entire Web site is being spidered in order to index the full content of a site, because the whole site is never rendered and so no substitute to the actual site is provided, search engines do not appear to be violating copyright.

On the Net, spidering is an automated process, but when dealing with physical books this requires a manual and semi-automated approach. There are now some robotic machines that handle the scanning and digitizing of bound books. These would in effect act in a manner somewhat analogous to spidering content, with the exception that a human being has to manually feed in the book(s) to be scanned. The purpose of doing this is to help people identify and find books that contain the content of interest to them. However, there are externalities to this.

For one thing, search engines can do more with the content than simply make it findable. They can analyze this content and determine is semantic relationship or relevance to other content. They aggregate analytics about the page, like how many people doing a search clicked on the search result for that content?…how many times did a particular piece of content come up in the search results?…how many links exist to that content page from other content pages (can you say Pagerank?)?…etc… These externalities could be regarded as the exhaust off of the content. This other stuff you can do and learn about the content by virtue of aggregating and analyzing it can also bring a tremendous amount of value all from the use of this copyrighted content, though none of that value goes to the copyright owner. Today, that value resides within the search engines like Google, Yahoo!, MSN Live, and It’s almost like the search engines are parasites to the aggregated content. They live on by the will of the content which is itself only found online by the will of the search engines. Talk about a conundrum.

Well, one important capability which is provided for online is that a site can have a robot.txt file indicating the site’s spidering policy, including not allowing for the site to be spidered at all. It is this simple capability for which an off-line or book world equivalent must be found. Effectively a way for book publishers or copyright owners to be able to “opt-out”. While this may not be a smart business decision for them, it should be a capability much in the same way that it exists as a capability for web pages.

This exhaust that I refer to from the aggregated content seems similar to that which is generated by users’ clicking activity and aggregated by advertising networks. Hence, where the AttentionTrust is promoting the idea of users being able to opt-out of being tracked (or cookied), this is functionally equivalent to users being able to indicate their robot.txt file for not wanting to be tracked. Note that this would create added incentive for the ad networks to offer real value for users to allow such tracking to continue.  Hence the quality of the offerings made to tracked users should also be commensurately higher and the conversions for advertisers should then increase. All of this resulting in fees to the ad networks also increasing. Seems like a win-win-win all around.

Not being an attorney myself, nor having any authority to assess how the book publishers lawsuit might go, I’d say that if the courts rule indiscriminately for the plaintiffs (in this case the Authors Guild), without considering how these issues are addressed online today, then they’d in effect also be ruling that search engines could no longer spider content without the explicit consent of the copyright owners. I believe this could be a bad precedent to set as it could be very impractical to require this inclusion process if it were not automated as it is with the robot.txt file. Effectively, what should be facilitated are tools for copyright owners to put all of their works up online (even if not visible to browsers) and leaving them the option for these to be included in the search engines with a legally enforced robot.txt file. Where today enforcement of the terms of that file are more voluntary than required, these could be given more powerful legal standing.

Just thinking a loud here, but it seems like these issues are gaining some momentum and have to be addressed sooner rather than later.

Tags: , , , , , , , , , , , , , ,


Posted in Intellectual Property, search & categorization, Technology | 1 Comment »

Open Data 2007 Summaries, Pictures and Discussion

Posted by direwolff on March 15, 2007

The Open Data 2007 Conference really brought out some worthwhile issues that will need to keep getting discussed and debated as these are fundamental to the continued development of the Web’s underpinings as well as the business models being developed by many early stage and existing companies in the space. These issues also need to get resolved in some fashion soon before the swell of public opinion from the uninitiated forces policy makers and politicians to impose more naive rules and regulations that suppress important developments in favor of keeping the status quo of onerous laws that only serve to support the interests of existing business stake holders.

For those interested in comments, discussions, and pictures that came out from the conference, go check out the Open Data 2007 Conference Wiki. This event was graciously hostedby Reuters in participation with the AttentionTrust, two organizations struggling through these issues today. Both Gerry Campbell from Reuters and Seth Goldstein from the AttentionTrust put together a wonderful event laying out great topics for discussion and setting out an agenda that was both intellectually stimulating and well in line with the issues we need to contend with immediately.

Tags: , , , ,

Posted in Intellectual Property, Online Community, Public Policy, search & categorization, Security/Privacy, Technology | Leave a Comment »

Good stuff being discussed at Open Data 2007

Posted by direwolff on March 13, 2007

I’ve been attending the Open Data Conference in New York at the Reuters office.  Good stuff.  There’s a real good post about last night’s dinner discussion at Roger Ehrenberg’s blog, Information Arbitrage.

Posted in Intellectual Property, Public Policy, reviews, search & categorization, Security/Privacy | Leave a Comment »

Apple & DRM, More Than Meets The Eye

Posted by direwolff on March 7, 2007

As with most things, taking what’s being said at face value and reacting to it can often result in an inappropriate reaction from not fully understanding the underlying issues. I’m plenty guilty of these leaps of heresy myself, so I’m not throwing stones here, just pointing it out because of an excellent write-up I just read on the issues surrounding Apple’s digital rights management (DRM) strategy that I was not previously aware of, which has drawn from some the wrong kinds of criticisms given the landscape that they are operating under. From Bruce Schneier’s “Schneier on Security” blog, he provides a link to an excellent post on the Roughly Drafted blog titled “Apple’s iTunes DRM Dilemma“.

The post goes into understandable detail on how the iTunes DRM technology works. For those not so interested in the technology aspects (still worth reading though), but more curious about the policy issues around this, skip down to the section titled “Why Apple Cares About DRM” (which is quickly followed by “Why Apple Doesn’t Care About DRM”) and read down through the end of the post. It’s well worth the read and provides some great insights into the issues surrounding Apple’s need to maintain the platform, Jobs’ recent comments on doing away with DRM, the regulatory environment in the E.U., the competitive aspects, and the RIAA’s iron fist in all of this.

For any one interested in the debate surrounding DRM and the role of the various constituents in this ecosystem, this blog post provides a very lucid picture worth reading.

Tags: , , ,

Posted in Intellectual Property, Public Policy, Security/Privacy, Technology | Leave a Comment »

Looks Like “Fair Use” are Two Very Lonely Words in Belgium

Posted by direwolff on February 13, 2007

I’ve never read a news article on Google News, mainly because this doesn’t seem possible, since all they provide is an excerpt that then enables me to click on the headline to go to the full story. As a result of this process, I’ve discovered newspapers that I didn’t know existed but that provided interesting and new perspectives I had not been previously exposed to. In a follow-up to a previous post I wrote, with the latest ruling out of the Belgian courts, it looks like one perspective I won’t be getting any time soon is the Belgian perspective.

To say that this is the most assinine lawsuit I’ve ever seen is probably going too far since there was the one about the woman who sued McDonald’s over the hot coffee she spilled on herself several years ago. But this one is pretty close. I wonder if these Belgian publishers know how to read their traffic logs to see how much traffic is coming their way as a result of Google. Well, traffic be darned (and god knows how much it costs to get it these days), it’s false principles of a previously monopolistic industry now trying to fight a no win battle. Winning their lawsuit insures lowered traffic to their properties, loosing the lawsuit makes them look dumb for having ever brought it up. If I were Google, I’d remove these publications content not just from Google News, but from their main search index as well. Yahoo!, MSN and Ask should follow suit here and do the exact same thing.

What’s even crazier, is that the French publishers, at least Agence France-Presse, have it on their minds to pursue a similar tact. And they wonder why their Web businesses aren’t successful. Don’t forget ACAP. Just as the music industry is reconsidering their DRM decisions, the news publishing industry is heading in the opposite direction…hmm…between Microsoft and the news publishers seems there’s very little care for their customers. Must be what happens when you get big, fat and lazy at your customer’s expense.

This may all be me playing arm chair quarterback where there are deeper issues at stake here, but if so, I’m totally missing it…wouldn’t be the first time though :)

*** 2/13/07 UPDATE (12:45pm PT):  Just had a chance to see Danny Sullvan’s write up about this and it’s worth checking out if you’re interested in this story as he has some more information and updates on the situation directly from the Google’s European legal counsel.

Tags: , , , , , ,

Posted in Intellectual Property | 2 Comments »

Microsoft, Out to Prove that the U.S. Patent System is Broken

Posted by direwolff on December 22, 2006

At least that’s my take, because to have the audacity to waste their internal staff’s time, as well as that of the patent office’s time, to create and review applications that they be granted the patent to RSS, has to be their way of exposing the insanity which today we call our patent system. The blogosphere is all a-buzz about this and rightly so. Here are the patent applications in question, 20060288011 and 20060288329. It would be interesting to see what, if any, prior art was mentioned in these applications given that such a list would no doubt have to include mention of Netscape and UserLand (Dave Winer), not to mention so many others since this was worked mostly in an open source arrangement.

If Microsoft even gets close to getting this patent, I think it’s time to throw in the towel and admit that our system doesn’t just suck, but that it isn’t a patent system at all. If you ever get a chance to check out the book Information Feudalism: Who Owns the Knowledge Economy, you’ll find out that this prior assessment I’ve made contingent on the success of these patent applications, is already the state of affairs but it affects too many people so no one is willing to call a spade a spade.

Tags: , ,

Posted in Intellectual Property | Leave a Comment »

When business interests want to lead patent reform, lookout

Posted by direwolff on October 22, 2006

I’m all for patent reform as there are lots of really bad things going on there and lots of patents being granted that should never have been issued. A dear friend who is also a dedicated patent buster and patent system dissident, Greg Aharonian, could provide a few days worth of data on why the patent system is already broken. He has a newsletter called Patent News that offers regular updates on patent related stories that can go from the absurd to the insane.

Between Greg’s influence and more recently reading Information Feudalism: Who Owns the Knowledge Economy (by Peter Drahos & John Braithwaite), I’m convinced that IBM’s desire to lead patent reform efforts as discussed in this article, is an effort worth being weary of and not to be taken lightly or allowed to go unchecked. No doubt that IBM is right about the system needing reform, the question is whether the reform they have in mind is the right kind or what’s truly needed at this time. Given their strong interests in the patent world, especially with the number of patents they regularly file and currently hold, their involvement is important but should not be as the lead of any reform effort in this area.

*** Update 10/23/06 – The following article was too appropriate not to include and link to:

IBM sues Amazon for violating patents

BM filed two lawsuits against on Monday, claiming key aspects of the internet retailer’s websites violate patents held by Big Blue.

Amazon is accused of infringing on five IBM patents, including technologies that govern how the site handles customer recommendations, advertising and data storage.

Tags: , ,

Posted in Intellectual Property | 2 Comments »

The Belgian Newspapers and Agence France-Presse just don’t get it

Posted by direwolff on September 20, 2006

We’ve seen the future and it will be…guess some news paper executives haven’t seen it yet.

According to this New York Times article, the Belgian courts have found that Google is violating the copyrights of several Belgian newspapers that have banded together to form a consortium. There are further comments from a representative of the Agence France-Presse (AFP), the French equivalent to the Associated Press, agreeing with the ruling. So here are a few points about this.

First off, any one who has ever bothered to use Google News knows that it’s a practical way of seeing news headlines and perhaps an incomplete sentence which is just enough to tease you into clicking and going to the full story. Next, there’s this little issue called “fair use”, which I don’t know how it’s handled in the European Union, but to think that you couldn’t reprint the headline and the first sentence or less of a news story from a specific news paper seems ludicrous by even the most stern standards. Finally, what are these people missing? Who else can drive the kind of traffic that Google can (well perhaps Baidu in China) at no cost. Why would these newspapers even consider the idea of walling themselves off or hope that end-users start coming to them directly? This walling off concept works to counter the popularity of a publication. Over time, that publication becomes irrelevant because it’s so much easier to get to the news of those who offer it online easily through channels we already use for many purposes.

Right now I’d be licking my chops if I was an entrepreneur in Belgium wanting to enter the news reporting business since it’s clear that opportunity is knocking at the door with this narrow minded court ruling. Imagine, all of a sudden, you could be the only news site that Google News links to on the Web. Wow!!! What’s that worth to an entrepreneur? It’s better than cash, that’s for sure.

Of course, the French arrogance did come shining through with the AFP representative’s response to Google’s comments:

“Google has a clear policy of respecting the wishes of content owners,” he said. “If a newspaper does not want to be part of Google News we remove their content from our index; all they have to do is ask. There is no need for legal action and all the associated costs.”

Mr. Louette of Agence France-Presse said that stance missed the point. “Effectively,’’ he said, “they are offering us an opt-out from appearing on Google, but this doesn’t address the real problem, which is that they attach no value to the headlines, pictures and text from around the world that we spend a lot of money producing.”

You can tell the irony of his comment is lost on him. He claims that Google attaches no value to their content, which is precisely why people that see a story on Google News click directly to it knowing that the value they will get from the news site is superior to the headline and the one line cryptic excerpt they see on Google News. What’s even more telling however and really shows what this is all about is his comments that Google needs to ask for permission rather than the publishers needing to opt-out. It’s this sort of attitude that has done damage to copyright holders in this country and obviously about to do the same to those in the E.U.

Haven’t these guys been watching what’s happening with YouTube and why Warner Music just did a deal with them allowing Warner’s music to be included in people’s videos? Haven’t they been following the Google Books initiatives and their negotiations with book publishers on these matters. It’s about being found. Oh well, “you can walk a mule to water but you can’t make it drink” ;-)

News providing entrepreneurs, REJOICE!…the future is yours if this ruling holds up after Google’s appeals.

Posted in Intellectual Property, Public Policy, search & categorization | Leave a Comment »