To create its search index, Google crawls websites. On each website, Google Crawler copies the entire content of the website and processes it. It stores the full content into its database, extracts links to other websites, aiding its efforts to rank them, uses it to train its artificial intelligence, etc.
But the most websites’ content is protected by copyright. What gives Google the right to copy this content and to benefit from it without permission or compensation? Google CEO Eric Schmidt’s response – “This is how Google works” – is not good enough. This is not fair use. A trillion-dollar company cannot simply take the labor of millions of individuals or small companies and claim fair use.
The only possible justification is an implicit contract between the website owner and Google. The website owner allows Google to crawl the website and copy its content, in exchange for a fair chance that Google would display links to the website to its search users, thereby sending traffic to the website. A website owner might rely on Google’s promise to Google Search users that it would return results that are useful, helpful, and authoritative for them (not for Google). Google Search is a service, and it does not have to be perfect or errors free. Google has a lot of leeway.
Starting around 2016, Google has been blacklisting or “greylisting” many conservative and dissenting sites, including Breitbart, American Thinker, PJ Media, and WattsUpWithThat. It has been taking copyrighted content from them, exploiting it far more than what the site owners knew or permitted it, but not showing links to them, as would be expected from such implicit agreement.
This is a copyright infringement. In my opinion, Breitbart and other blacklisted website owners can sue Google and ask the court for an injunction to enjoin Google against continuing infringement. Google cannot stop infringing just by removing websites from its index – their content and derivatives of it have been used to build the very heart of the Google index, databases, and algorithms.
A website owner can exclude Google from crawling its website by adding an appropriate line in /robots.txt, and Google crawler respects such requests. Nevertheless, Google retains the content that it has already appropriated from that website and its derivatives and represents that it indexes the entire web.
I am not familiar with the precedents on this issue. I think some litigation happened more than a decade ago and was resolved in the circuit courts, in favor of Google, based on the situation at that time, and without much technical expertise on the part of the courts. The circumstances have drastically changed since then.
Using one’s copyrighted content for blacklisting also violates another fair use requirement: that the use does not decrease the demand for the copyrighted content. For example, blacklisting and substitution links to Breitbart by links to other news companies is obviously unfair use.
Remark: Eric Schmidt is quoted from the memory.