ICAIL 2017
DESI VII Workshop on
Using Advanced Data Analysis in eDiscovery & Related Disciplines
to Identify and Protect Sensitive Information in Large Collections

June 12, 2017, Strand Campus, King's College London, UK

Purpose | Important Dates | Submissions | DESI History
Papers | Program | Registration | Organizing Committee | Program Committee


The DESI VII workshop will provide a platform for discussion of best practices and innovations in the use of advanced search technology, text classification, language processing, data organization, visualization and related techniques for the purposes of accessing and managing electronically stored information. One focus of the DESI VII workshop will be on emerging protocols and novel techniques for identifying and protecting sensitive information in large collections. The workshop will also welcome contributions on other topics that are within the workshop’s broader scope. We expect the refined focus on protecting sensitive content this year to be directly relevant to at least four application contexts: We expect to address the following open questions:

In eDiscovery: What techniques are currently being used to classify information found in email or other data sources as privileged, confidential, or otherwise protected by law? How widespread is the use of technology for this type of information identification? How well do current technologies perform with respect to the classification of sensitive information?

In EU privacy policies: To what degree can current algorithmic techniques adequately characterize content that individuals might wish to have blocked from certain types of access in adherence with “right to be forgotten” laws? To what extent can the process of adjudicating such requests reasonably be automated? How well do algorithmic techniques perform in identifying sensitive data that may need to be blocked from cross-border transfers? To what extent can these capabilities satisfy requirements for algorithmic accountability?

In audits and investigations: What tools and techniques are available to find and protect well-defined categories of sensitive content? Examples from the US and Canada might include protected health information, student education records, customer record information, card holder data, or proprietary or confidential information (e.g., trade secrets). To what extent can taxonomies be constructed for information that is routinely the focus of internal audits to facilitate automatic detection of those categories of information? To what extent can technical support for investigations be designed to protect sensitive content that is not material to the investigation?

In public access requests: How well can current procedures and automated techniques identify and protect personal, political, proprietary or otherwise confidential content? To what extent can automated techniques reliably detect specific types of personally identifiable information which, if released, would constitute an unwarranted invasion of privacy?

The workshop discussion will be grounded in the results of original research, such as that reported in interdisciplinary venues such as ICAIL, law reviews, technical conferences in specific disciplines (e.g., KDD, ICWSM, ACL, SIGIR), and shared task evaluations (e.g., TREC, CLEF, NTCIR).

Participation is invited from all interested parties, including those with backgrounds in:

Important Dates


Two types of written contributions are invited: Please note that because of the workshop’s focus on research interchange, we are not able to accept commercial white papers or similar corporate materials.

Submissions should be sent by email to Jack Conrad (jack.g.conrad (put at sign here with no spaces on either side) tr.com) with the subject line DESI VII RESEARCH/OPERATIONALPRACTICE PAPER or DESI VII POSITION PAPER. All submissions received will be acknowledged within 3 days.

A PDF of the second Call for Submissions is also available.

DESI History

DESI VII follows five successful prior DESI (Discovery of Electronically Stored Information) workshops: at ICAIL 2007 (DESI I, Palo Alto), ICAIL 2009 (DESI III, Barcelona), ICAIL 2011 (DESI IV, Pittsburgh), ICAIL 2013 (DESI V, Rome), ICAIL 2015 (DESI VI, San Diego), and an intermediate workshop (DESI II) at University College London in 2008. In DESI I, a wide array of individuals came together for perhaps the first time to foster engagement between e-discovery practitioners and a broad range of research communities who might contribute to the development of new technologies to support the e-discovery process. The DESI II and III workshops broadened the scope of this discussion to include comparisons of requirements between differing national settings and legal environments. DESI IV built on these efforts, in having a first-of-its-kind general discussion of standard-setting for the legal profession through contemplation of ISO 9001 frameworks as well as capability maturity models. Most recently, DESI V extended the discussion of standards to include the question of what standards could and should be made applicable to the use of predictive coding and other advanced techniques, that were at the time beginning to be cited in U.S. case law. The DESI VI workshop in San Diego aimed to broaden the scope of legal issues to which advanced data analysis and classification technologies might credibly be applied, beyond ediscovery to a fuller range of information governance applications.


  • Peer-Reviewed Research Papers (will be posted by May 26)
  • Position Papers (More to come!)


    An updated draft program (updated on May 25) is now available.


    DESI is a part of the International Conference on Artificial Intelligence and Law, and DESI participants must therefore register using the ICAIL Registration Site.

    Organizing Committee

    Program Committee

    Doug Oard
    Last modified: Thu May 25 14:27:34 2017