0;136;0c0;136;0c ICAIL 2015 DESI VI Workshop

ICAIL 2015 Workshop on Using Machine Learning and Other Advanced Techniques to Address Legal Problems in E-Discovery and Information Governance
(DESI VI Workshop)

June 8, 2015, University of San Diego, San Diego, California

Purpose | Outcomes | Agenda | Papers | Important Dates | Background
DESI History | Submissions | References | Organizing Committee


The sixth workshop on Discovery of Electronically Stored Information (DESI VI) workshop aims to bring together researchers and practitioners to explore innovation and the development of best practices for application of search, classification, language processing, data management, visualization, and related techniques to institutional and organizational records in e-discovery, information governance, public records access, and other legal settings. Questions addressed include:
  1. What combinations of machine learning and other techniques can best categorize information in accordance with existing records management policies?
  2. Do effective methods exist for performing sentiment analysis and personal information identification in a legally useful way in e-mail and other records of interpersonal communication?
  3. How well can we estimate the end-to-end costs of workflows that combine artificial intelligence and human coding to accomplish legal tasks on a broad range of content types?
  4. Can proactive insider threat detection leverage information already being collected for records management purposes, and what would be the ethical and legal fallout of such approaches?
  5. Are approaches available to reduce the perceived conflicts between privilege and transparency in labeling data for Technology-Assisted Review (TAR) in e-discovery and public records access applications?
  6. What technical, procedural, and legal issues arise from recent proposals to shift the focus of e-discovery from relevance to materiality?
  7. Where do recent legal cases point to the need for new research to better inform the decision of courts and the practices of parties?
  8. What lessons can we draw from recent shared-task evaluations such as TREC and EDI, and how can future shared-task evaluations best be structured?
  9. How can current techniques for issue coding be applied to compliance tasks (e.g., in regulatory, enforcement, and investigations settings), and what capability gaps exist that call for new research?
  10. What implications do emerging technologies such as deep learning and fine-grained access to behavioral traces have for e-discovery, business intelligence, and records and information management purposes?
Participation is invited from all interested parties, including those with interests in:


A report from the Workshop is now available.


A final agenda (last updated June 5) is now available.

Papers and Abstracts

Keynote Speakers Refereed Papers Position Papers

Important Dates


With data continuing to double worldwide every 18 months (EMC/IDC Study 2014), institutions of all kinds (public and private) are turning to the use of powerful analytics to provide greater visibility into data sets for multiple purposes. Against this backdrop, lawyers around the world continue to face the problem of how to effectively and efficiently conduct searches for relevant documents, across increasingly complex, enterprise-wide collections within corporate and institutional settings. Lawyers are interested in doing so, both for purposes of responding to litigation demands, as well as to solve a variety of other legal issues arising in the workplace. As noted by The Sedona Conference®, a leading legal think tank, "the legal profession has passed a crossroads" in terms of its willingness to embrace automated, analytical, and statistical approaches that may help solve the twin problems of volume and complexity of data (The Sedona Conference, 2013). Spurred by a growing body of case law acknowledging the legitimacy of using predictive coding and other forms of advanced search in e-discovery, legal professionals are increasingly interested in applying the tools and techniques developed in litigation settings to solving other types of legal issues, including but not limited to: providing legal advice on employment issues, mergers and acquisitions, and whistleblower allegations, after searching large collections of email, text messages, and other forms of electronic communication, including social media.

The year 2012 marked a watershed in the development of legal practice in the United States, with published opinions in a total of three courts, and a multi-day evidentiary proceeding in a fourth court, all concerning the propriety of lawyers’ use of advanced search techniques in civil litigation to find relevant documents in large evidentiary data sets. In particular, the judicial ruling in the federal case of da Silva Moore v. Publicus Groupe (U.S. District Court, New York), has advanced the law in this area, establishing that lawyers need no longer consider themselves "guinea pigs" coming into court to justify the use of advanced search techniques such as predictive coding, as the court gave judicial blessing to the use of such advanced techniques in the name of greater efficiency and efficacy in results. The Moore court specifically took judicial notice of past research in the information retrieval area, including published reports arising out of the TREC Legal Track and similar endeavors. Other rulings have followed, including recent cases in which courts on their own initiative have ordered the use of predictive coding without a prior motion from either party.

Notwithstanding these early judicial blessings, as well as the widespread attention presently being given to the use of advanced search techniques during document review processes, there is no widely agreed upon set of standards or best practices with respect to how lawyers actually go about using these new techniques, in e-discovery or otherwise. Indeed, two of the "joint protocols" entered into in both the Moore case and another case, In re Actos (U.S District Court, Louisiana), have proven to be controversial by requiring transparency in the sharing of information between opposing parties to carry out quality control sampling and assessment of what constitutes relevant or non-relevant evidence. More recently, a lively dialogue has arisen among proponents of varying types of machine learning techniques, particularly with respect to variants of active learning, and the most efficient method for "seeding" or "training" the software (E-DiscoveryTeam Blog 2014). Past DESI workshops, as well as leading organizations such as The Sedona Conference®, have recognized the lack of best practice standards in this area. (DESI IV, 2011; DESI V 2013; The Sedona Conference, 2013b).

DESI History

DESI VI follows five successful prior DESI (Discovery of Electronically Stored Information) Workshops: at ICAIL 2007 (DESI I, Palo Alto), ICAIL 2009 (DESI III, Barcelona), ICAIL 2011 (DESI IV, Pittsburgh) and ICAIL 2013 (DESI V, Rome), and an intermediate workshop (DESI II) at University College London in 2008. In DESI I, a wide array of individuals came together for perhaps the first time to foster engagement between e-discovery practitioners and a broad range of research communities who might contribute to the development of new technologies to support the e-discovery process. The DESI II and III workshops broadened the scope of this discussion to include comparisons of requirements between differing national settings and legal environments. DESI IV built on these efforts, in having a first-of-its-kind general discussion of standard-setting for the legal profession through contemplation of ISO 9001 frameworks as well as capability maturity models. Most recently, DESI V extended the discussion of standards to include the question of what standards could and should be made applicable to the use of predictive coding and other advanced techniques, that were at the time beginning to be cited in U.S. case law. The DESI VI workshop in San Diego will benefit from all of the past discussions, but will aim to broaden the scope of legal issues to which advanced data analysis and classification technologies might credibly be applied, beyond ediscovery to a fuller range of information governance applications.

Ideally, the aim of the DESI workshop series has been to foster a continuing dialogue leading to the adoption of further best practice guidelines or standards in using machine learning, most notably in the ediscovery space. Past DESI research papers have contributed to thought leadership, including being cited in the academic literature (e.g., RAND 2012), as well as informing a current effort to craft an ISO standard on e-discovery. The DESI VI workshop is intended to have an expanded focus, to be of interest to actors in the technology sector interested in further adopting machine learning and other advanced techniques not only for e-discovery but but in a wider variety of legal settings where working with and categorizing data in large data sets is increasingly important.

Submissions (archived; deadlines have passed)

We invite refereed papers describing research or practice. After peer review, accepted papers will be posted on the DESI VI website and distributed to workshop participants. Authors of accepted refereed papers will be invited to present their work either as an oral or a poster presentation. Refereed papers should be 4 to 10 pages; longer papers may be returned without review.

We also invite unrefereed position papers describing individual interests for inclusion (without review) on the DESI VI Web site and distribution to workshop participants. Position papers should typically be on the order of 2-3 pages.

Participation in the DESI VI workshop is open. Submission of papers is encouraged, but not required.

Submissions should be sent by email to Doug Oard (oard@umd.edu) with the subject line DESI VI POSITION PAPER or DESI VI RESEARCH PAPER. All submissions received will be acknowledged within 3 days.

The Call for Submissions is also available as a PDF document.


Much has been published on E-Discovery generally, so no list of references could hope to be complete. Here are a few papers and cases that we know of that we believe would be useful as background reading for the focus of this workshop. Please send recommended additions for this list to oard@umd.edu.


  1. Ashley, Kevin D., “Can AI & Law Contribute to Managing Electronically Stored Information in Discovery Proceedings? Some Points of Tangency,” paper presented at DESI Workshop,
  2. Ashley, Kevin D. & W. Bridewell, “Emerging AI & Law approaches to automating analysis and retrieval of electronically stored information in discovery proceedings,” 18 Artificial Intelligence and Law 311 (2011)
  3. Baron, Jason R., “Law in the Age of Exabytes: Some Further Thoughts on ‘Information Inflation’ and Current Issues in E-Discovery Search, 17 Richmond J. of Law & Tech. 3 (2011), http://jolt.richmond.edu/v17i3/article9.pdf
  4. Baron, Jason R., "Toward A Federal Benchmarking Standard for Evaluating Information Retrieval Products Used in E-Discovery,” 6 Sedona Conference Journal 237-246 (2005) (available on Westlaw, Lexis)
  5. Borden, Bennett B. & J.R. Baron, “Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice,” 20 Richmond J. of Law & Tech. 7 (2014), http://jolt.richmond.edu/v20i2/article7.pdf
  6. Conrad, Jack G., “E-Discovery revisited: the need for artificial intelligence beyond information retrieval,” 18 Artificial Intelligence and Law 4 (2010).
  7. Conrad, Jack G., “E-Discovery Revisited: A Broader Perspective for AI Researchers,” paper presented at DESI Workshop, Workshop on Supporting Search and Sensemaking For Electronically Stored Information in Discovery Proceedings Eleventh International Conference on Artificial Intelligence and Law, Palo Alto, June 4, 2007, http://www.umiacs.umd.edu/~oard/desiws/
  8. Cormack, Gordon V., M.R. Grossman, “Evaluation of Machine-Learning Protocols for Technology Assisted Review in E-Discovery,” SIGIR 2014, http://plg2.cs.uwaterloo.ca/~gvcormac/calstudy/study/sigir2014-cormackgrossman.pdf
  9. E-discoveryteam Blog, “Talking Turkey,” guest blog by M. Grossman & G. Cormack (Sept. 7, 2014), http://e-discoveryteam.com/2014/09/07/guest-blog-talking-turkey/
  10. EMC/IDC Digital Universe Study 2014, http://www.emc.com/leadership/digitaluniverse/ index.htm
  11. Grossman, M. and G. Cormack, “Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review,” 17 Richmond J. of Law and Technology 3 (2011)
  12. Lewis, David D., “Afterword: data, knowledge and e-discovery,” 18 Artificial Intelligence and Law 481 (2010).
  13. TREC Legal Track web page, http://trec-legal.umiacs.umd.edu (containing TREC 2006 through TREC 2011 overview papers)
  14. Oard, Douglas W., J.R. Baron, B. Hedin, D.D. Lewis, S. Tomlinson, “Evaluation of information retrieval for E-Discovery,” 18 Artificial Intelligence and Law 347 (2010).
  15. Oard, Douglas W., W. Webber, “Information Retrieval and E-Discovery,” 6 Foundations and Trends in Information Retrieval http://ediscovery.umiacs.umd.edu/pub/ow12fntir.pdf
  16. Pace, Nicholas M., L. Zakaras, “Where The Money Goes: Understanding Litigant Expenditures for Producing E-Discovery,” RAND Publication (2012) http://www.rand.org/pubs/monographs/MG1208.html
  17. Paul, George L. and J.R. Baron, “Information Inflation: Can The Legal System Cope?,” 13 Richmond Journal of Law and Technology (2007), http://law.richmond.edu/jolt/v13i2/article10.pdf.
  18. The Sedona Conference, The Sedona Best Practices Commentary on the Use of Search and Information Retrieval in E-Discovery (2013 revised ed.)(2013a), http://www.thesedonaconference.org/content/miscFiles/publications_html
  19. The Sedona Conference, Commentary on Achieving Quality in the E-Discovery Process (2013 revised ed.) (2013b), http://www.thesedonaconference.org/content/miscFiles/publications_html
  20. Webber, William, “Re-examining the Effectiveness of Manual Review,” SIGIR 2011 Information Retrieval for E-Discovery (SIRE) Workshop, Beijing, China (2011), http://www.umiacs.umd.edu/~oard/sire11/#Papers
  1. da Silva Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012), approved and adopted, 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012)
  2. Digicel (St. Lucia) Limited v. Cable & Wireless PLC (England and Wales High Court (Chancery Division), [2008] EWHC 2522 (Ch) (23 October 2008).
  3. EORHB v. HOA Holdings, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012)
  4. Global Aerospace Inc., et al. v. Landow Aviation, L.P., et al., 2012 WL 1431215 (Va. Cir. Ct. Apr. 23, 2012).
  5. In re Actos (Pioglitazone) Products, 2012 WL 3899669 (W.D. La. July 27, 2012)
  6. Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711 (N.D. Ill.) (Nolan, M.J.)
  7. United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008)

Organizing Committee

Doug Oard
Last modified: Wed Sep 9 12:03:18 2015