Future of legal big data

The party line in the ediscovery and information-lifecycle-governance industries is that businesses need tools to reduce the amount of data they retain and the amount of data they produce in connection with investigations and litigation. This is wrong.

The party line is historically driven. The cost, in the past, of collecting, processing, reviewing, and producing electronically stored information is what led to the party line’s adoption. When collection services cost $300 per hour, processing costs $250 per gigabyte, and review involves thousands of hours of lawyer time plus hosting fees, user fees, and the like, the natural reaction is to reduce the amount of data that has to be collected, processed, reviewed, and produced by any means possible. This thinking resonates nicely with the desire of businesses to keep their information confidential; to produce as little as they defensibly can.

What do you see when you set aside the party line and consider the problem of big data from a business perspective, in light of the technology available today and the technology we expect to be available soon?

The first and most important realization is that retention, storage, and access to business data should be driven by operational concerns: businesses should decide what data to keep and how to store it and make it accessible based on how that data can help them make money. The prior approach, where retention, storage, and access is driven by compliance, regulatory, or litigation concerns, is the tail wagging the dog. We should think about how to keep data and use it to advance business goals and, only secondarily, think about how we can satisfy compliance, regulatory, and litigation requirements at reasonable cost.

Email is a great example. Companies, driven by the party line, are looking for more and more aggressive tools to reduce the email they keep and reduce further the email that is subject to review in investigations and litigation. But email, in almost all companies, is the primary, natural repository of operational information: it’s where business people live. The operational value of tools that can extract actionable business information from email is tremendous. The holy grail is technology that can extract this information with no change in the way people create and use email in day to day business. Employees generate valuable information in email as always, and technology surfaces that email when there is an operational need for it.

What problems can we expect to face as we open a new office in New York? Well, let’s extract that information from the email (and other documents and electronically stored information) that we created when we opened a new office in Boston last year. This kind of operational thinking — how can we generate actionable business information from the data we create in the ordinary course of business — is what should drive decisions about how data is retained and stored.

The concern that motivates the party line, the cost and intrusiveness of discovery, should be addressed not by reducing the amount of data we keep, but by reducing the cost per unit of data of storing, collecting, accessing, and reviewing that data. This is what we’re out to do at Disco. Already, we’ve brought to market ediscovery software that delivers a 10x speed improvement at search and review and does it at perfectly predictable flat-fee pricing that is 1/2 or less the prices charged by others. Speed will only increase; cost will only fall.

The operational usefulness of data is the carrot for keeping it. The stick is that the arguments for why it isn’t available — why it wasn’t retained, why it can’t be collected or searched or reviewed — are going to get weaker over time. As technology like Disco makes it easier to collect, search, and review data, businesses and lawyers will be harder and harder pressed to argue that it can’t or shouldn’t be done.

The right approach is to retain lots of data, use technology to extract actionable business information from it, and use technology to reduce, dramatically, the costs of handling that data in investigations and litigation. Destroying data isn’t the answer.

One thought on “Future of legal big data

  1. I enjoy wbat you guys are usually up too. This kind of clever
    work and coverage! Keep up the terrific works guys I’ve incorporated you guys
    to my personal blogroll.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s