Reprinted with permission from the October 2017 issue of ALI CLE’s The Practical Lawyer.


“In God we trust. All others must bring data.”

— Professor William Edwards Deming

All of us who often speak and write about the ongoing revolution in data analytics for litigation have heard it from at least some of our fellow lawyers: “Interesting, but so what?”

Here’s the answer in a nutshell. One often hears that business hates litigation because it’s enormously expensive and risky. There’s a degree of truth to that, but it’s far from the whole truth. Business doesn’t dislike expense or risk per se. Business dislikes unquantified expense and risk. As the maxim often (incorrectly) attributed to Peter Drucker goes, “You can’t manage what you can’t measure.”

Don’t believe me? If your client offers to sell an investment bank a two billion dollar package of mortgages, the bank gets nervous. But tell the bank that based on the past ten years of data, 65.78 percent of the mortgages will be paid off early, 24.41 percent will be paid off on time, and 9.81 percent will default, and they know how to deal with that.

It’s the same thing in litigation. For generations, most facts that would help a business person understand the risks involved have been solely anecdotal: this judge is somewhat pro-plaintiff or pro-defendant; the opposing counsel has a reputation for being aggressive or smart (or not); juries in this jurisdiction often make runaway damage awards or are notoriously parsimonious. But every one of those anecdotal impressions and bits of conventional wisdom can be approached from a data-driven perspective, quantified and proven (or disproven). Do that, and we’ve taken a giant step towards approaching litigation the way a business person approaches business—by quantifying and managing every aspect of the risk.

I hear lawyers talking about “early adopters” of data analytics tools in litigation, but the truth is, we’re not early adopters by a long shot. The business world has been investing billions in data analytics tools for a generation in order to understand and manage their risks.

Tech companies use algorithms to choose among job applicants and assign “flight risk” scores to employees according to how likely each is thought to be to leave. Billions of dollars in stock are traded every day by algorithms designed to predict gains and reduce risk. Both Netflix and Amazon’s websites (among many others) track what you look at and buy or rent in order to recommend additional choices you’ll be interested in. In 2009, Google developed a model using search data which predicted the spread of a flu epidemic virtually in real time. UPS has saved millions by placing monitors in their trucks to predict mechanical failures and schedule preventive maintenance. The company’s algorithm for planning drivers’ optimal routes shaved 30 million miles off drivers’ routes in a single year. Early in his term as New York Mayor, Michael Bloomberg created an analytics task force that crunched massive amounts of data gathered from all over the city to determine which illegal conversions (structures cut up into many smaller units without the appropriate inspections and licensing) were most likely to be fire hazards. Political campaigns now routinely use mountains of data to not only identify persuadable voters, but determine the method most likely to work with each one.

The application of data analytic techniques to the study of judicial decision making arguably begins with a 1922 article for the Illinois Law Review by political scientist Charles Grove Haines. Haines reviewed over 15,000 cases of defendants convicted of public intoxication in the New York magistrate courts. He showed that one judge discharged only one of 566 cases, another 18 percent of his cases, and still another fully 54%. Haines argued that his data showed that case results were reflecting to some degree the “temperament . . . personality . . . education, environment, and personal traits of the magistrates.”

In the early 1940s, political scientist C. Herman Pritchett published The Roosevelt Court: A Study in Judicial Politics and Values, 1937-1947. Pritchett published a series of charts showing how often various combinations of Justices had voted together in different types of cases. He argued that the sharp increase in the dissent rate at the U.S. Supreme Court in the late 1930s necessarily argued against the “formalist” philosophy that law was an objective reality which judges merely found and declared.

Another landmark in the judicial analytics literature, the U.S. Supreme Court Database, traces its beginnings to the work of Professor Harold Spaeth about three decades ago. Professor Spaeth created a database which classified every vote by a Supreme Court Justice in every argued case for the past five decades. Today, thanks to the work of Spaeth and his colleagues Professors Jeffery Segal, Lee Epstein and Sarah Benesh, the database has been expanded to encompass more than two hundred data points from every case the Supreme Court has decided since 1791. The Supreme Court Database is the foundation of most data analytic studies of the Supreme Court’s work.

Professors Spaeth and Segal also wrote one another classic, The Supreme Court and the Attitudinal Model, in which they proposed a model arguing that a judge’s personal characteristics—ideology, background, gender, and so on—and so-called “panel effects”—the impact of having judges of divergent backgrounds deciding cases together as a single, institutional decision maker—could reliably predict case outcomes.

The data analytic approach began to attract attention in the appellate bar in 2013, with the publication of The Behavior of Federal Judges: A Theoretical & Empirical Study of Rational Choice. Judge Richard Posner and Professors Lee Epstein and William Landes applied various regression techniques to a theory of judicial decision making with its roots in microeconomic theory, discussing a wide variety of issues from the academic literature.

Although the litigation analytics industry is changing rapidly, the four principal vendors are Lex Machina, Ravel Law, Bloomberg Litigation Analytics and Premonition Analytics. Lex Machina and Ravel Law began as startups (indeed, both began at Stanford Law School), but LexisNexis has now purchased both companies. Lex Machina is fully integrated with the Lexis platform, and Ravel will be integrated in the coming months. Although there are certain areas of overlap, all four analytics vendors have taken a somewhat different approach and offer unique advantages. For example, Premonition’s database covers not just most state and all federal courts, but also offers data on courts in the United Kingdom, Ireland, Australia, the Netherlands and the Virgin Islands.

The role of analytics in litigation begins with the earliest moments of a lawsuit. If you’re representing the defendant, Bloomberg and Lex Machina both offer useful tools for evaluating the plaintiff. How often does the plaintiff file litigation, and in what areas of the law? Were earlier lawsuits filed in different jurisdictions from your new case, and if so, why? Scanning your opponent’s filings in cases in other jurisdictions can sometimes reveal useful admissions or contradictory positions. If your case is a putative class action, these searches can help determine at the earliest moment whether the named plaintiff has filed other actions, perhaps against other members of your client’s industry. Have the plaintiff’s earlier actions ended in trials, settlements or dismissals? This can give counsel an early indication of just how aggressive the plaintiff is likely to be.

All four major vendors have useful tools for researching the judge assigned to a new case. Ravel Law has analytics for every federal judge and magistrate in the country, as well as all state appellate judges. State court analytics research is always a challenge because of the number of states whose dockets are not yet available in electronic form, but Premonition Analytics claims to have as large a state-court database as Lexis, Westlaw and Bloomberg combined. How much experience does your judge have in the area of law your case involves compared to other judges in the jurisdiction? How often does the judge grant partial or complete dismissals or summary judgments early-on? How often does the judge preside over jury trials? Were there jury awards in any of those trials, and how do they compare to other judges’ trials? What is defendants’ winning percentage in recent years before your judge? Ravel Law and Bloomberg can provide data on how often your trial judge’s opinions are cited by other courts— an indicator of how well respected the judge is by his or her peers— as well as how often the judge is appealed, and how many of those appeals have been partially or completely successful. The data can be narrowed by date in order to focus on the most recent decisions, as well as by area of law. Say your assigned judge appears to be more frequently appealed and reversed than his or her colleagues in the jurisdiction. Are the reversals evenly distributed across time, or concentrated in any particular area of law? If your judge’s previous decisions in the area of law where your case arises have been reversed unusually often, it can influence how you conduct the litigation. Counsel can keep all this data current through Premonition’s Vigil court alert system, which patrols Premonition’s immense litigation database and can give counsel hourly alerts and updates, keyed to party name, judge, attorney or case type, from federal, state and county courts. Many jurisdictions give parties one opportunity, before any substantive ruling is made, to seek recusal of the assigned judge as a matter of right, without proof of prejudice. Data-driven judge research can help inform your decision as to whether to exercise that right.

Lex Machina’s analytics platform focuses on several specific areas of law, giving counsel a wealth of information for researching a jurisdiction (additional databases on more areas of law will be coming soon). For example, in antitrust, cases are tagged to distinguish between class actions, government enforcement, Robinson-Patman Act cases, as well as others. The platform is integrated with the MDL database, linking procedurally connected cases. The database reflects both damages—whether through a jury award or a settlement—and additional remedies, such as divestiture and injunction. Cases are also tagged by the specific antitrust issue, such as Sherman Act Section 1, Clayton Act Section 7, the rule of reason or antitrust exemptions. The commercial litigation data includes the nature of the resolution, any compensatory or punitive damages, and the legal finding—contract breach, rescission, unjust enrichment, trade secret misappropriation, and many more. The copyright database similarly tracks damages, findings and remedies, and allows users to exclude from their data “copyright troll” filings. Lex Machina’s federal employment law database includes tags for the type of damages—backpay, liquidated damages, punitive damages and emotional distress, the nature of any finding, and the remedy given. The patent litigation database includes many similar fields, but also a patent portfolio evaluator, isolating which patents have been litigated, and a patent similarity engine, which finds new patents and tracks their litigation history. The securities litigation database enables users to focus on the type of alleged violation, tracking the most relevant outcomes, and the trademark litigation database contains data for the legal issues and findings, damages and remedies in each case.

Analytics research is important for the plaintiffs’ bar as well. Bloomberg’s Legal Analytics platform is integrated with its enormous library of corporate data covering 70,000 publicly held and 3.5 million private companies. Counsel can survey a company’s litigation history, and the information is keyed to the underlying dockets. The data can be focused by jurisdiction or date, as well as to include or exclude subsidiaries. Lex Machina’s Comparator app can compare not only the length of time particular judges’ cases tend to take to reach key milestones but also previous outcomes, including damages awards and attorneys’ fees awards. A plaintiffs’ firm can use such data in cases where there are multiple possible venues to select the jurisdiction likely to deliver the most favorable result in the shortest time.

One bit of conventional wisdom that is commonly heard in the defense bar is that defendants should generally remove cases to federal court when they have the right to do so because juries are less prone to extreme verdicts and the judges are more favorable to defendants. Although comprehensive data on state court trial judges is still less common than data on federal judges, all four major analytic platforms can help evaluate courts and compare judges, giving a client a data-driven basis for making the removal decision.

Researching your opposing counsel is important for both defendants and plaintiffs. How aggressive is opposing counsel likely to be? Bloomberg Analytics covers more than 7,000 law firms, and enables users to focus results by clients, date and jurisdiction. Is your opposing counsel in front of your judge all the time? If so, that can inform decisions like whether to seek of-right substitution of the judge or remove the case. What were the results of those earlier lawsuits? Reviewing opposing counsel’s client list can suggest how experienced opposing counsel is in the area of law where your case arises. Lex Machina’s law firms comparator also enables the user to compare opposing counsel to their peers, and get an idea of what opposing counsel’s approach to the lawsuit is likely to be. Lex Machina’s app enables counsel to compare opposing counsel’s previous cases by open and terminated cases, days elapsed to key events in the case, case resolutions and case results. In preparing this article, I reviewed a report generated by Lex Machina’s Law Firms Comparator and learned several things I didn’t know about my own firm’s practice. Ravel Law’s Firm Analytics enables counsel to study similar data about one’s opponent, focused by practice area, court, judge, time or proceeding—or all of the above. Firm Analytics also compares opposing counsel to other law firms in the jurisdiction, showing whether counsel appears before the trial judge frequently, and whether they tend to win (or lose) more often than comparable firms. All this information gives counsel a tremendous leg up as far as estimating how expensive the litigation is likely to be.

As you begin to develop the facts of a case, motions begin to suggest themselves. Is your client’s connection to the jurisdiction sufficiently tenuous to support a motion to dismiss for lack of personal jurisdiction, or for change of venue? Has the plaintiff failed to satisfy the Twombly/Iqbal standard by stating a plausible claim? Discovery motions to compel and for protective orders are commonplace, and inevitably defense counsel will face the question of whether to file a motion for summary judgment.

Ravel Law’s platform has extensive resources for motions research. For every Federal judge, the system can show you how likely the judge is to grant, partially grant or deny a total of 90+ motions—not just the easy ones like motions for summary judgment or to dismiss, but motions to stay proceedings or remand to state court, motions to certify for interlocutory appeal, motions for attorneys’ fees, motions to compel or for an injunction and motions in limine. This can by an enormous savings in both time and money for your clients. Even where examining the facts suggests that a motion for summary judgment might be in order, that calculus might look very different when one learns that the trial judge has granted only 18 percent of the summary judgment motions brought before him or her since 2010.

On Lex Machina’s platform, counsel can use the “motion kickstarter” to survey recent motions before the assigned trial judge. The “motion chain” links together the briefing and the eventual order for each motion, so counsel can identify the arguments which have succeeded in recent cases, and review both the parties’ briefs and the judge’s order.

Ravel Law offers extensive resources to help counsel in crafting their arguments. As counsel does her research, Ravel Law shows visualizations demonstrating how different passages of a case have been cited, and by which judges, enabling counsel to quickly zero in on the passages which judges have found most persuasive. Or the research can be approached from the other direction, by identifying the cases and passages most often cited by your judge for particular principles. How does the judge typically explain the standards for granting a motion to dismiss, or for summary judgment? Does the judge tend to frequently cite Latin legal maxims, or even sports analogies? How does your federal judge handle the state law of his or her home jurisdiction? How has your judge ruled in rapidly evolving areas of the law, such as class certification, arbitration and personal jurisdiction? Now it’s easy to find out.

And when the case finally goes to trial, there’s still a role for judicial analytics. How often do the judge’s cases go to trial? What kinds of cases have tended to go to trial before your trial judge? What were the results? The data you pulled at the outset on the length of the judge’s previous trials might suggest just how liberal or strict the judge tends to be with the parties in trial. Did either party waive a jury, and if so, what happened? How has your trial judge handled jury instructions in recent trials where the parties didn’t waive the jury? What were the awards of damages, plus any awards of attorneys’ fees or punitive damages?

Post-trial is an often overlooked opportunity to cut litigation short by limiting or entirely wiping out an adverse verdict through new trial motions and motions notwithstanding the verdict. Counsel can determine on Lex Machina’s motion comparator, Ravel Law’s motions database or Bloomberg’s Litigation Analytics how likely judges are to either overturn or modify a jury verdict. A close look at the data and recent orders and motions will help inform a decision as to whether to file a motion for judgment notwithstanding the verdict or a motion for new trial. If your client has been hit with a punitive damages award, you’ll need to review not only the judge’s record on post-trial review of punitives, but drill down from there to the order and the briefing on the motion to evaluate what approaches worked (or didn’t).

Analytics have tremendous potential in appellate work too. All of the major vendors have enormous collections of data on state and federal appellate courts and judges. But for my firm’s appellate practice, I was interested in tracking a number of different variables which would be difficult to extract through computer searches, so rather than relying on any of the vendors, I built two databases in-house. Our California and Illinois Supreme Court databases are modeled after Professors Spaeth and Segal’s Supreme Court database, tracking many of the same variables. My California Supreme Court database encompasses every case the court has decided since January 1, 1994 – 1,004 civil and 1,293 criminal, quasi-criminal and attorney disciplinary. My Illinois Supreme Court database is even bigger, including every case that court has decided since January 1, 1990 – 1,352 civil and 1,529 criminal. For each of these 5,000+ cases, I’ve extracted roughly one hundred different data points. Was the plaintiff or the defendant the appellant in the Supreme Court? Is there a government entity on either side? Where did the case originate, and who was the trial judge? Before the intermediate appellate court, we track dissents, publication, the disposition and the ideological direction of the result. We track three dates for each case: the date review was granted, the date of the argument and the date of the decision. Before the Supreme Court, we note both the specific issue and the area of the law involved, the prevailing party and the vote, the writers and length of all opinions, the number of amicus curiae briefs and who each amicus supported, and of course each Justice’s vote. In addition, our database includes data from every oral argument at the Illinois Supreme Court since 2008, and arguments at the California Supreme Court since May 2016, when the Court first started posting video and audio tapes of its sessions.

Conventional wisdom in most jurisdictions holds that unless the intermediate appellate court’s decision was published with a dissent, it’s not worth seeking Supreme Court review. We’ve demonstrated that in fact, a significant fraction of both the California and Illinois Supreme Court’s civil dockets arises from unpublished unanimous decisions. We track not just aggregate reversal rates for intermediate appellate courts, but break the data down into reversal rates by area of law.

Lag times are particularly interesting in California, since the Supreme Court is generally required to decide cases within ninety days of oral argument. As a result, the vast majority of the lag between grant of review and final decision in California falls between grant and argument, rather than argument and decision. Not only have we tracked the average time to resolution for civil and criminal cases— we’ve demonstrated that there’s a correlation between the Supreme Court’s decision and the lag time from grant to argument. We’ve tracked the individual Justices’ voting records, not just overall, but one area of law at a time.

Only in the past few years have data analysts began to take a serious look at appellate oral arguments. The earliest study appears to be Sarah Levien Shullman’s 2004 article for the Journal of Appellate Practice and Process.  Shullman analyzed oral arguments in ten cases at the United States Supreme Court, noting each question asked by the Justices and assigning a score from one to five to each depending on how helpful or hostile she considered the question to be. Based upon her data, she made predictions as to the ultimate result in the three remaining cases. Comparing her predictions to the ultimate results, Shullman concluded that it was possible to predict the result in most cases by a simple measure – the party being asked the most questions generally lost.

John Roberts addressed the issue of oral argument the year after Shullman’s study appeared. Then-Judge Roberts noted the number of questions asked in the first and last cases of each of the seven argument sessions in the Supreme Court’s 1980 Term and the first and last cases in each of the seven argument sessions in the 2003 Term. Like Shullman, Roberts found that the losing side was almost always asked more questions.

Timothy Johnson and three other professors published their analysis in 2009. Johnson and his colleagues examined transcripts from every Supreme Court case decided between 1979 and 1995—more than 2,000 hours of argument in all, and nearly 340,000 questions from the Justices. The study concluded, after controlling for a number of other factors that might explain case outcomes, all other factors being equal, the party asked more questions generally wound up losing the case.

Professors Lee Epstein and William M. Landes and Judge Richard A. Posner published their study in 2010. Epstein, Landes and Posner used Professor Johnson’s database, tracking the number of questions and average words used by each Justice. Like Professor Johnson and his colleagues, they concluded that the more questions a Justice asks, all else being equal, the more likely the Justice will vote against the party, and the greater the difference between total questions asked to each side, the more likely a lopsided result is. Our study of every oral argument at the Illinois Supreme Court from 2008 through 2016 came to the same conclusion: the larger the margin between your total questions from the Court and your opponent, the less your chance of winning.

Litigation analytics can uncover useful insights outside of courtrooms as well. Corporate legal departments are increasingly using analytics to track and manage their outside counsel. Does the company have more or less litigation than its competitors? Do the lawsuits last a comparable length of time, and is the company’s win rate comparable to its peers? What are the trends over time? When the company is selecting counsel for a particular lawsuit, depending on where the case is venued, it should be possible by consulting Premonition, Lex Machina or Bloomberg to compare each candidate counsel’s winning percentage in the jurisdiction and before the particular judge, as well as to develop far more background information than was ever possible before. From the viewpoint of the law firms competing for business, analytics offers an invaluable insight into the nature of your target client’s business. All the same questions which the legal department will likely be interested in are valuable to the outside attorneys as well. Is your target’s current counsel not winning cases as often as other companies are? What’s the nature of the company’s litigation? And if candidate counsel can discover the names of the other firms competing for the business, analytics databases can provide detailed information about those lawyers’ experience and relevant background. Premonition’s Vigil court alerts system can get lawyers word of a new filing or case development involving a client or potential client only an hour or two after it happened, not a few days later.

So how does the future look? We’re still in the early days of the revolution in litigation analytics. As the federal PACER system is upgraded and more and more states put some or all dockets in electronic form, more litigation data will become available to analytics vendors. Analytics scholars will develop new methods to turn additional aspects of litigation into usable data. Upgrades in artificial intelligence systems will result in analytics learning to gather more subtle data from court records— the kind of variables that require understanding and interpretation, rather than simply looking for text strings. More analytics vendors will inevitably enter the market.

Lawyers will have to become comfortable working with analytics data in situations where decisions were once made based upon intuition and experience, both in courtrooms and in clients’ counsel searches. More law firms will likely develop in-house analytics databases similar to mine in other large states.

We’ve barely scratched the surface in terms of statistical and theoretical techniques which can uncover new insights about litigation and judicial decision making. Several academics have proposed algorithms for predicting case outcomes based on information such as the composition of an appellate panel and the ideology, gender and background of the judges, and these algorithms have generally performed better than law professors’ predictions based on the legal issues involved. Regression modeling is a natural next step not just to predict case results, but to estimate the real impact of various variables, such as how much (if at all) amicus support increases one’s odds of winning. Several vendors have touted their data on winning percentages for lawyers, but regression modeling could isolate how much impact a particular counsel really has upon a party’s chances, or whether the jurisdiction or the nature of a lawyer’s clients explains his or her record. As Judge Posner and Professors Epstein and Landes suggested in The Behavior of Federal Judges, computerized sentiment analysis of the content of judicial opinions could produce more nuanced insights about particular judges’ attitudes and ideology. Game theory is another well-developed academic discipline with a largely untapped potential for understanding how appellate courts work.

We end with the question every analytics scholar (and vendor) is asked sooner or later: will litigation analytics replace lawyers?

The answer is no, for two reasons.

The first is what I think of as the orange used car problem.

A few years ago, a company which conducts data mining competitions for corporate clients ran a contest in hopes of building an algorithm to determine which among used cars available at auction was likely to have mechanical problems. They collected the data, ran the correlations, and it turned out the strongest correlation to “few or no mechanical problems” was, you guessed it, that the vehicle was orange.

A few people facetiously proposed theories as to why orange used cars might be more trouble-free (maybe car fanciers with better maintenance habits are drawn to them?), but this is an example of one of the most fundamental rules in data analytics: correlation does not necessarily indicate causation. Saying two variables are highly correlated doesn’t necessarily mean one is causing the other; both could be caused by a third, unidentified variable, or it could be a random correlation, or your dataset could be biased or simply too small. Much of litigation analytics—at least short of the more sophisticated logistic regression modeling – currently consists of identifying correlations. It takes an experienced lawyer intermediary to review the data and understand what are valuable, actionable insights and what are just orange used cars.

The second reason is even more fundamental: all litigation analytics require interpretation, and one must keep constantly in mind—and remind clients early and often – that nothing in analytics is a guarantee of any particular result. The more heavily questioned party does win at times in the appellate courts. Just because Justices A and B have voted together in 75 percent of the tort cases in the past five years is no guarantee they won’t disagree about the next one. The academic algorithms which have been developed for predicting results at the Supreme Court are wrong anywhere from twenty percent to a third of the time. Some often-quoted statistics can mislead through over-aggregation. For example, perhaps an intermediate court’s overall reversal rate on all cases is two-thirds, but on further analysis, it turns out that the reversals are all in tort cases, while the court is generally affirmed in other areas of the law.

Does this mean that litigation analytics are irrelevant? No, no more so than the bank would find the experiential data on the hypothetical mortgage bundle we discussed at the outset irrelevant. Attorneys have been predicting what courts are likely to do for generations based on intuition, experience and anecdote. The business world began moving away from that a generation ago, and now that revolution has struck the law full force. Today, there’s data for most aspects of litigation, and that trend builds every year. The advent of litigation analytics and data-driven decision making is a game-changer in terms of intelligent management of litigation risk.

Image courtesy of Flickr by Barcelona Supercomputing Center – National Supercomputing Center (BSC-CNS)