A Pirate Lifes: Why the Technology Sector Should Care About Google Books

Antitrust lawyer and Open Book Alliance leaderGary Reback has been called the “antitrust champion” and the “protector of the marketplace” by the National Law Journal, and has been at the forefront of many of the most important antitrust cases of the last three decades. He is one of the mostvocal opponents of the Google Books settlement. Iinterviewed Reback a few months ago, and Google Books was one of the topics we discussed. In the column below, Reback discusses Google Books and its ties to Google search.

This Thursday leaders of the international publishing industry will watch with bated breath as a federal judge in New York hears arguments over whether to approve the Google Book Settlement.

More a complicated joint venture among Google and five big New York publishers than the resolution of pending litigation, the proposed settlement once promised unprecedented access to millions of out-of-print books through digital sales to consumers and online research subscriptions for libraries. But with the passage of time and the ability to examine the deal more closely, the promises proved illusory. The big publishers, as it turns out, have reserved the right to negotiate secret deals with Google for the books they claim through the settlement.

Meanwhile, torrents of outrage rained down on the New York court – from authors whose ownership rights will be appropriated through the settlement’s procedures, from librarians fearful of price exploitation by Google, from privacy advocates worried that Google will monitor the reading habits of library patrons, from libertarians incensed over the use of a legal procedure to effect the widespread appropriation of property, from digital booksellers concerned about Google’s unfair advantage in the marketplace.

Actually, those in the tech community should be watching the settlement proceedings more closely than anyone else. We have the most to lose if the deal is approved in its present form because, at bottom, the Google Book Settlement is not really about books. It’s really about search, the most important technology in the new economy.

According to the Department of Justice, Google dominates the market for search advertising and search syndication on the Web, with greater than a 70% share in both markets. These markets are difficult to enter because of powerful network effects and scale characteristics. Recent entry has been all but futile; indeed, the company with the second largest share, Yahoo, is leaving the market.

The search markets are special and different – even from other web markets. Google’s dominant share in these markets means that substantial numbers of web-based enterprises secure much of their business through “referrals” from Google’s search engine or advertisements placed by Google’s ad platform. The dominant market share makes Google the arbiter of each web business (books or medical supplies, as examples). In each case, Google decides which company succeeds and which company fails by its placement in search results and ad listings on the Google site.

The industry’s fear of Google has grown exponentially, right along with the company’s influence on web commerce. Not six months ago a prominent executive from a top web site – who withheld his name for fear of retribution – made an astounding proposal in a TechCrunch post. Noting from his own experience the potential for abuse inherent in Google’s power, the executive called for government regulation of the search markets to prevent manipulation of search results and ad listings.

The last six months have confirmed the anonymous executive’s worst fears. Once upon a time, Google claimed it employed neutral, mathematically-based algorithms to prioritize search in ad listings. But last November Google admitted to the Washington Post that only search results from Google’s content competitors are listed according to neutral algorithms. Search results from Google’s own properties, like maps, news and books, are now listed first, the algorithmnotwithstanding. Even more recently Google admitted that it changes the rank ordering of paid search ads to prioritize its own company messages.

Whatever the advisability of government regulation, few would dispute that we need more and better competition in search to curb Google’s power. But Google is doing its best to keep that from ever happening. That’s where the Book Settlement comes in. Google intends to use the settlement to disadvantage its competitors and to bolster its own position in search.

Google announced its project to scan and digitize books in December 2004. Both commercial and not-for-profit entities started scanning books before Google did. Several other rivals started scanning books shortly after Google announced its project. All of these competitorsscanned (pdf) only books in the public domain or for which they secured the rightsholder’s permission. Google, on the other hand, scanned all books in the collections of some of the nation’s leading research libraries, including those still under copyright, without securing permission from the rightsholders.

In the fall of 2005, five New York publishers along with the Authors Guild sued Google for copyright infringement. After three years of secret negotiations, and without taking a single deposition in the case, the parties announced a settlement on October 28, 2008. Through a legal ploy known as a “class certification” (which must be approved by the court), the plaintiffs who brought the suit now claim to speak for all holders of U.S. copyrights. Their proposed settlement gives Google (among other things) the right, in response to search queries, to display lengthy textual excerpts from just about every out-of-print book with a U.S. copyright (unless the rightsholder affirmatively objects) – tens of millions of books, in all.

Very recent results from scientific studies of web searching explain why Google has spent enormous amounts of money to acquire the digital rights to vast numbers of old, dusty books. Most search queries are directed to popular subjects – shopping, travel, medical information, etc. Some queries, though, are directed to more obscure subject matter. These are known as “rare,” “obscure,” “esoteric,” or, sometimes, “tail” queries, in reference to the “tailing off” portion of a graph showing the frequency distribution of a population (search queries, in this case) exhibiting the Pareto principle, known to everyone who sells products as the 80-20 rule. Most queries are directed to a few (relatively speaking) popular subjects and therefore show up in the “fat” part of the frequency curve. The frequency of increasingly obscure queries “tails off” asymptotically, providing a “long tail” to the right of the “fat” part of the curve.

For a time, computer scientists thought that most obscure queries were generated by only a few users (again, speaking relatively), and, hence, search engines could ignore obscure tail queries and still serve the great bulk of the user population. But research has shown that just about everyone makes a rare query from time to time. And, people decide which engine to use for their everyday search needs based on the engine’s ability to satisfy these rare queries, just as one would expect in a world that values “one-stop shopping.” Stated more formally, satisfying demand in the tail increases consumption in the “head” or fat part of the distribution curve.

Google will get an enormous advantage over its search competitors if it can support (i.e., respond satisfactorily to) tail queries that its competitors cannot. Scientific research shows that supporting tail queries produces a disproportionately large increase in overall user satisfaction – i.e., disproportionately increases the size of the user population highly satisfied with the engine’s performance. In fact, according to the most recent study, satisfying an additional 1% tail queries increases overall user satisfaction with the engine more than 5% — this, in a market in which companies battle fiercely to wrest even a tenth of a point in market share away from Google’s control.

Digital rights to virtually all out-of-print books will provide Google with a decisive advantage in responding to tail queries. Google created its book database by scanning the collections of the nation’s leading research libraries. These libraries consist largely of academic works on a wide variety of obscure subjects. The books contain information relevant to all kinds of rare queries. Much of the older information in the books might not be available from other sources, at least on the public web. Whatever the publication value of these books, they provide an enormous advantage in search. Indeed, presentations by Google within the last couple of months confirm that the company expects to use text from digital books to satisfy many of its users’ tail queries. If Google can stretch its advantage even further and deny its search rivals the ability to integrate the same corpus of books, Google’s lead in search will become insurmountable.

The proposed settlement does just that, leaving Google’s search competitors out in the cold. The settlement provides no means at all for competitors to get rights to so-called “orphan works” – in-copyright books whose rightsholders cannot be located. According to the parties’ court filings made just last week, ownership has been claimed for only about one million books out of the more than 12 million books scanned and the 170 million unique works identified by Google, leaving the company with exclusive digital rights to well over 90% of U.S. books. In addition, the settlement sets up procedures that make it easy for Google to clear rights to all other out-of-print works where rightsholders can be located, but leaves rivals without a mechanism to easily resolve disputes over ownership and copyright status that preclude competitive distribution. If approved in its current form, then, the settlement will solidify Google’s hold on the search market by giving the company exclusive rights to millions upon millions of books.

Under some circumstances, Google might be entitled to a competitive advantage that it secured through superior foresight. But, that’s not what happened here. The publisher plaintiffs demanded that Google’s competitors respect claims of copyright in their scanning, even as they secretly negotiated (pdf) with Google to give that company the settlement deal the plaintiffs never offered to Google’s competitors. The Department of Justice made the point most clearly in its brief. Google’s search dominance, DoJ said, may be further entrenched by its “exclusive access to content” through the settlement.

This outcome has not been achieved by a technological advance in search or by operation of normal market forces; rather, it is the direct product of scanning millions of books without the copyright holders’ consent and then using [class action procedures] to achieve results not otherwise obtainable in the market.

Permitting a company to solidify its dominance over all of web commerce through controversial legal stratagems rather than open market competition invites economic disaster. Likely, the judge will see Google’s ploy in that light, just as the Justice Department did. If not, government regulation might well be our only recourse.

A Pirate Lifes

Wednesday, February 17, 2010

Why the Technology Sector Should Care About Google Books

No comments:

GMT+ 8:00

Topic

Chat Box

Labels

Link