Get Latest Internet Marketing News, Views and Methods All in One Place: Thomson Reuters Win AI Copyright Case, Spelling Trouble for AI Firms

Thomson Reuters has won its case against Ross Intelligence, setting a legal precedent for how AI firms collect and use the vast quantities of data their models rely on.

The vast majority of AI companies have engaged in legally questionable behavior, hoovering up vast quantities of copyrighted data to use for training purposes. The firms have argued that fair use covers their activity, but that hasn’t stopped multiple companies and media outlets from suing various AI firms.

Thomson Reuters sued Ross Intelligence, a startup that has since shut down because of the cost of the legal battle, alleging copyright infringement. Specifically, Ross Intelligence was accused of using Thomson Reuters’ legal database as the basis for some of its AI-generated materials.

Notably, in his ruling, U.S. Circuit Judge Stephanos Bibas reversed his original decision, in which he initially ruled that a jury would need to decide the fair use aspect of the case.

A smart man knows when he is right; a wise man knows when he is wrong. Wisdom does not always find me, so I try to embrace it when it does––even if it comes late, as it did here.

I thus revise my 2023 summary judgment opinion and order in this case. See Fed. R. Civ. P. 54(b); D.I. 547, 548; Thomson Reuters Enter. Ctr. GmbH v. Ross Intel. Inc., 694 F. Supp. 3d 467 (D. Del. 2023). Now I (1) grant most of Thomson Reuters’s motion for partial summary judgment on direct copyright infringement and related defenses, D.I. 674; (2) grant Thomson Reuters’s motion for partial summary judgment on fair use, D.I. 672; (3) deny Ross’s motion for summary judgment on fair use, D.I. 676; and (4) deny Ross’s motion for summary judgment on Thomson Reuters’s copyright claims, D.I. 683.

Case Background

Judge Biba then goes on to summarize the case, acknowledging that Thomson Reuters’ Westlaw database is one of the largest legal databases in the U.S., with the company licensing its contents to users. In an effort to build a competing database, Ross asked to license Westlaw content. Because Ross’ stated goal was to build a competitor to Westlaw, Thomson Reuters understandably declined to license its content to the firm.

In what has been a common refrain among AI firms when they can’t legally access data they want/need for their AI models, Ross moved ahead anyway.

So to train its AI, Ross made a deal with LegalEase to get training data in the form of “Bulk Memos.” Id. at 5. Bulk Memos are lawyers’ compilations of legal questions with good and bad answers. LegalEase gave those lawyers a guide explaining how to create those questions using Westlaw headnotes, while clarifying that the lawyers should not just copy and paste headnotes directly into the questions. D.I. 678-36 at 5–9. LegalEase sold Ross roughly 25,000 Bulk Memos, which Ross used to train its AI search tool. See D.I. 752-1 at 5; D.I. 769 at 30 (10:48:35). In other words, Ross built its competing product using Bulk Memos, which in turn were built from Westlaw headnotes. When Thomson Reuters found out, it sued Ross for copyright infringement.

The Headnotes and Key Number System Questions

At the heart of the case was whether Ross infringed copyright by copying Westlaw headnotes based on their originality.

The headnotes are original. A headnote is a short, key point of law chiseled out of a lengthy judicial opinion. The text of judicial opinions is not copyrightable. Banks v. Manchester, 128 U.S. 244, 253–54 (1888). And even if it were, Thomson Reuters would not get that copyright because it did not write the opinions. But a headnote can introduce creativity by distilling, synthesizing, or explaining part of an opinion, and thus be copyrightable. That is why I have changed my mind.

First, the headnotes are a compilation. “Factual compilations” are original if the compiler makes “choices as to selection and arrangement” using “a minimal degree of creativity.” Feist, 499 U.S. at 348. Thomson Reuters’s selection and arrangement of its headnotes easily clears that low bar.

More than that, each headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer’s editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. 17 U.S.C. § 102(a)(5). So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor’s idea about what the important point of law from the opinion is. That editorial expression has enough “creative spark” to be original. Feist, 499 U.S. at 345. So all headnotes, even any that quote judicial opinions verbatim, have original value as individual works. That belated insight explains my change of heart. In my 2023 opinion, I wrongly viewed the degree of overlap between the headnote text and the case opinion text as dispositive of originality. 694 F. Supp. 3d at 478. I no longer think that is so. But I am still not granting summary judgment on any headnotes that are verbatim copies of the case opinion (for reasons that I explain below).

Similarly, Ross Intelligence copied Westlaw’s Key Number System, although it did not present the Key Number System to customers.

“The Key Number System is original too. There is no genuine issue of material fact about the Key Number System’s originality. Recall that Westlaw uses this taxonomy to organize its materials. Even if “most of the organization decisions are made by a rote computer program and the high-level topics largely track common doctrinal topics taught as law school courses,” it still has the minimum “spark” of originality. Id. at 477 (internal quotation marks omitted); Feist, 499 U.S. at 345. The question is whether the system is original, not how hard Thomas Reuters worked to create it. Feist, 499 U.S. at 359–60. So whether a rote computer program did the work is not dispositive. And it does not matter if the Key Number System categorizes opinions into legal buckets that any first-year law student would recognize. To be original, a compilation need not be “novel,” just “independently created by” Thomson Reuters. Id. at 345–46. There are many possible, logical ways to organize legal topics by level of granularity. It is enough that Thomson Reuters chose a particular one.

The Fair Use Issue

The biggest issue of all, however, was whether Ross’ actions fell under Fair Use, a legal doctrine that allows copyrighted material to be used under specific circumstances. In his ruling, Judge Biba reiterated that he was reversing his initial ruling, including on the fair use question, before outlining the four specific factors that must be considered.

I must consider at least four fair-use factors: (1) the use’s purpose and character, including whether it is commercial or nonprofit; (2) the copyrighted work’s nature; (3) how much of the work was used and how substantial a part it was relative to the copyrighted work’s whole; and (4) how Ross’s use affected the copyrighted work’s value or potential market. 17 U.S.C. § 107(1)–(4). The first and fourth factors weigh most heavily in the analysis. Authors Guild v. Google, Inc., 804 F.3d 202, 220 (2d Cir. 2015) (Leval, J.).

Factor One – The Purpose and Character of Ross’ Use

Judge Biba said the first factor went in favor of Thomson Rueters, ruling that Ross’ commercial intentions and lack of any type of transformative nature of Ross’ use of Westlaw data argued against fair use.

Ross’s use is not transformative. Transformativeness is about the purpose of the use. “If an original work and a secondary use share the same or highly similar purposes, and the second use is of a commercial nature, the first factor is likely to weigh against fair use, absent some other justification for copying.” Warhol, 598 U.S. at 532–33. It weighs against fair use here. Ross’s use is not transformative because it does not have a “further purpose or different character” from Thomson Reuters’s. Id. at 529.

But because Ross’s use was commercial and not transformative, I need not consider this possible element. Even if I found no bad faith, that finding would not outweigh the other two considerations.

Factor Two – The Nature of the Original Work

The second factor went in favor of Ross. This factor went back to the creativity involved in Westlaw’s headnotes, and whether they met the threshold to warrant fair use protection.

Westlaw’s material has more than the minimal spark of originality required for copyright validity. But the material is not that creative. Though the headnotes required editorial creativity and judgment, that creativity is less than that of a novelist or artist drafting a work from scratch. And the Key Number System is a factual compilation, so its creativity is limited.

So factor two goes for Ross. Note, though, that this factor “has rarely played a significant role in the determination of a fair use dispute.

Factor Three – How the Work Was Used and Was Relative to the Whole

The third factor also went in favor of Ross.

My prior opinion did not decide factor three but suggested that it leaned towards Ross. The opinion focused on Ross’s claim that its output to an end user is a judicial opinion, not a West headnote, so it “communicates little sense of the original.” 649 F. Supp. 3d at 485 (quoting Authors Guild, 804 F.3d at 223).

I stand by that reasoning, but now go a step further and decide factor three for Ross. There is no factual dispute: Ross’s output to an end user does not include a West headnote. What matters is not “the amount and substantiality of the portion used in making a copy, but rather the amount and substantiality of what is thereby made accessible to a public for which it may serve as a competing substitute.” Authors Guild, 804 F.3d at 222 (internal quotation marks omitted). Because Ross did not make West headnotes available to the public, Ross benefits from factor three.

Factor Four – The Effect of Ross Copying Westlake

Judge Biba cites Harper & Row in saying this fourth factor “is undoubtedly the single most important element of fair use.”

My prior opinion left this factor for the jury. I thought that “Ross’s use might be transformative, creating a brand-new research platform that serves a different purpose than Westlaw.” 694 F. Supp. 3d at 486. If that were true, then Ross would not be a market substitute for Westlaw. Plus, I worried whether there was a relevant, genuine issue of material fact about whether Thomson Reuters would use its data to train AI tools or sell its headnotes as training data. Id. And I thought a jury ought to sort out “whether the public’s interest is better served by protecting a creator or a copier.” Id.

In hindsight, those concerns are unpersuasive. Even taking all facts in favor of Ross, it meant to compete with Westlaw by developing a market substitute. D.I. 752-1 at 4. And it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough. Ross bears the burden of proof. It has not put forward enough facts to show that these markets do not exist and would not be affected.

The Decision

Ultimately, when taking the above factors into consideration, Judge Biba rejected Ross’ fair-use defense.

Factors one and four favor Thomson Reuters. Factors two and three favor Ross. Factor two matters less than the others, and factor four matters more. Weighing them all together, I grant summary judgment for Thomson Reuters on fair use.

I grant partial summary judgment to Thomson Reuters on direct copyright infringement for the headnotes in Appendix A. For those headnotes, the only remaining factual issue on liability is that some of those copyrights may have expired or been untimely created. This factual question underlying copyright validity is for the jury. I also grant summary judgment to Thomson Reuters against Ross’s defenses of innocent infringement, copyright misuse, merger, scenes à faire, and fair use. I deny Ross’s motions for summary judgment on direct copyright infringement and fair use. I revise all parts of my prior opinions that conflict with this one. I leave undisturbed the parts of my prior opinion not addressed in this one, such as my rulings on contributory liability, vicarious liability, and tortious interference with contract.

“We are pleased that the court granted summary judgment in our favor and concluded that Westlaw’s editorial content, created and maintained by our attorney editors, is protected by copyright and cannot be used without our consent. The copying of our content was not ‘fair use,'” the company said in a statement.

The Implications of the Decision

The implications of Judge Biba’s decision will reach far and wide within the AI industry, and should serve as a warning to AI companies throughout the industry who have engaged in similar practices.

Meta, for example, is involved in a court case in which its own internal emails detail the questions and concerns staff had about pirating more than 80 TB of tens of millions of books. Those same emails implicate OpenAI for allegedly engaging in the same behavior, including pirating books from the same sources.

AI firms have maintained that fair use covers their activities, making it legal to hoover up any and all data, regardless of copyright status. Judge Biba’s decision, on the other hand, raises major questions about that argument.

If Judge Biba’s ruling is used as a legal precedent by the many other AI copyright cases being litigated, it could spell disaster for the AI industry, leaving firms and their executives liable for untold sums in damages and even facing potential criminal charges.

from WebProNews https://ift.tt/YXOHkpM

Get Latest Internet Marketing News, Views and Methods All in One Place

Wednesday, 12 February 2025

Thomson Reuters Win AI Copyright Case, Spelling Trouble for AI Firms