SIGIR
2011 Information Retrieval for
E-Discovery (SIRE) Workshop
July 28, 2011, Beijing, China
Purpose
Electronic discovery ("e-discovery") is the use of Information
Retrieval (IR) technology to find evidence requested by a party in a
legal matter. This application of IR has grown explosively in recent
years. The SIRE workshop will provide a forum for discussion of IR
techniques that have or that could be applied to e-discovery, as
well as methods for evaluating the effectiveness and cost of such
approaches.
Topics of interest include, but are not limited to:
- Distributed search of large heterogeneous enterprise information
systems, including corporate intranets, archival and backup
repositories, cloud-based storage, etc.
- High-recall search of large collections, including those with
high densities of relevant documents
- Supervised learning of classifiers for responsiveness, privilege
and other factors of interest (sometimes referred to in e-discovery as
predictive coding)
- IR techniques that leverage the characteristics of specific types
of business records (email, instant messages, voice mail, file
systems, etc.)
- Clustering, link analysis, and other methods for discovering
structure in large collections, including detection of duplicate
documents
- Process design for human-in-the-loop review and exploitation of
large data sets, including measurement of inter-reviewer consistency,
active learning, etc.
- Evaluation design, including sampling strategies, estimation of
confidence intervals, and reusability of large test collections
Background
Information retrieval is a key technology for e-discovery, but there
are few opportunities for IR researchers and e-discovery practitioners
to meet. The TREC Legal Track is limited to groups participating in
those evaluations. The four DESI (Discovery of Electronically Stored
Information) workshops have brought together people from the law and
the e-discovery industry with researchers, but IR research is only one
part of the issues addressed there. SIGIR is, therefore, an ideal
place to focus on the intersection between IR research and
e-discovery. The vast majority of e-discovery spending occurs in the
USA. However, interest in IR for e-discovery is worldwide.
Multinational companies in any country may become involved in US court
cases, and the laws on discovery are evolving in many countries in
ways informed by the US experience. Moreover, some IR issues that are
central to e-discovery are also important in other domains, including
enterprise search, law enforcement and historical research.
Interest on both the legal and IR sides has climbed steadily for
years, but only recently have the key IR research questions come into
clear focus. The TREC evaluations, along with high profile studies
from the Electronic Discovery Institute and others, have shown the
great potential for text retrieval and categorization in e-discovery,
and have highlighted the challenges that arise when demands for high
recall collide with document diversity and poor inter-reviewer
agreement.
Agenda
The final schedule is now available.
The first session will lay out the nature of e-discovery, thus making
the workshop accessible to SIGIR attendees with no prior e-discovery
experience. A keynote talk will be followed by a panel discussion
including senior e-discovery experts. The second session will then
focus on IR techniques. The third session, after lunch, will focus on
evaluation. In both cases, the goal will be to dive deeply into
specific issues. The fourth session will feature a moderated panel
discussion focused on crafting a research agenda. The goal will be to
identify key issues, venues for action, communities to engage with,
and support to seek.
Papers by Workshop Participants
- New Research Papers (peer reviewed)
- New Position Papers (not peer reviewed)
- Previously Published Papers
- Jason R. Baron, Law in the Age of
Exabytes: Some Further Thoughts on 'Information Inflation' and Current
Issues in E-Discovery Search, Richmond Journal of Law and
Technology, 17(3), Spring (2011)
- Conor R. Crowley, Defending
the Use of Analytical Software in Civil Discovery, Digital
Discovery and E-Evidence, 10(16), September 16 (2010)
- David van Dijk, Hans Henseler and Maarten de Rijke, Semantic Search in E-Discovery, DESI IV
Position Paper, Pittsburgh, PA (2011) [research team represented at
SIRE by Wouter Weerkamp]
- David D. Lewis, Afterword:
Data, Knowledge, and E-Discovery, Artificial Intelligence and
Law, 18(4)481-486 (2011)
- Douglas W. Oard, Jason R. Baron, Bruce Hedin, David
D. Lewis and Stephen Tomlinson, Evaluation
of Information Retrieval for E-Discovery, Artificial
Intelligence and Law, 18(4)347-386 (2011)
- Patrick Oot, Anne Kirshaw and Herbert L. Roitblat, Mandating
Reasonableness in a Reasonable Inquiry, Denver University Law
Review, 87(2)522-559 (2010)
- John M. Facciola and Jonathan M. Redgrave, Asserting
and Challenging Privilege Claime in Modern Litigation: The
Facciola-Redgrave Framework, The Federal Courts Law Review,
4(1)19-54 (2009)
Additional References
Much has been published on e-discovery generally, so no list of
references could hope to be complete. Here are a few papers that we
know of that we believe would be useful as background reading for the
focus of this workshop. Please send recommended additions for this
list to oard@umd.edu. If you don't have online access to a document,
the author may be able to send you a copy.
- R. Bauer, D. Brasil, C. Hogan, G. Taranto and J. Brown, Impedance Matching
of Humans and Machines in High-Q Information Retrieval Systems,
IEEE Conference on Systems, Man and Cybernetics (2009)
- M. Grossman and G. Cormack, Technology-Assisted
Review in E-Discovery Can Be More Effective and More Efficient Than
Exhaustive Manual Review, Richmond Journal of Law and Technology,
17(3), Spring (2011)
- B. Hedin, S. Tomlinson, J. Baron and D. Oard TREC
2009 Track Overview (2010)
- J. Krause, Human-Computer
Assisted Search in EDD, Law Technology News, December 20 (2010)
- J. Markoff, Armies
of Expensive Lawyers, Replaced by Cheaper Software, New York
Times, March 4 (2011)
- H. Roitblat, A. Kershaw and P. Oot., Document
Categorization in Legal Electronic Discovery: Computer Classification
vs. Manual Review, Journal of the American Society for Information
Science and Technology, 61(1)70-80 (2010)
- The Sedona Conference®, Commentary
on Achieving Quality in E-Discovery (2009)
Related Events
- EDRM 2011-2012 Kickoff Meeting, St Paul, MN, USA, May 11-12, 2011
- ICAIL 2011 DESI IV Workshop, Pittsburgh, PA, USA June 6, 2011
- TREC 2011 Legal Track, Gaithersburg, MD, USA, November, 2011
Important Dates
- Research papers due: May 13
- Position papers due: June 3
- Notification for research papers: June 3
- Preliminary agenda posted: June 13
- Workshop: July 28, 2011
Organizers
Jason R. Baron, National Archives and Records Administration, USA
Dave Lewis, David D. Lewis Consulting, USA
Maura R. Grossman, Wachtell, Lipton, Rosen and Katz, USA
Douglas W. Oard, University of Maryland, USA
Program Committee
- Avi Arampatzis, Democritus University of Thrace (Greece)
- Gordon Cormack, University of Waterloo (Canada)
- Jack G. Conrad, Thomson Reuters (USA)
- Aron Culotta, Southeastern University (USA)
- Bruce Fein, Backstop LLP (USA)
- Bruce Hedin, H5 (USA)
- April Kontostathis, Ursinus College (USA)
- Eli Nelson, Cleary, Gottlieb, Steen and Hamilton (USA)
- Venkat Rangan, Clearwell Systems (USA)
- Herb Roitblat, OrcaTec (USA)
- Mark Sanderson, RMIT (Australia)
- Howard Turtle, Syracuse University (USA)
- Ellen Voorhees, NIST (USA)
- Jianqiang Wang, University at Buffalo, SUNY (USA)
- John Wang (USA)
- William Webber, University of Melbourne (Australia)
Archived Materials
Two types of written contributions were invited:
- Original research papers (4-10 pages). After peer review,
accepted papers will be posted on the SIRE website and made available
in hard-copy to workshop participants. Authors of accepted research
papers will be invited to present their work either as an oral or a
poster presentation. Research papers are due on May 13, 2011;
decisions will be returned by June 3, 2011.
- Brief (typically 1-2 page) position papers describing individual
interests, for inclusion (without review) on the SIRE Web site and
distribution to workshop participants. Brief descriptions of this
type are particularly valuable when bringing together diverse research
communities. Additionally, these papers can help with our selection
of discussion leaders, discussants, and panelists. Position papers
are requested by June 3, 2011. Participation in the workshop is open,
so prior submission of position papers is strongly encouraged, but not
strictly required.
Last Update: June 30, 2011