University of Maryland Parallel Corpus Project: Bible

University of Maryland Parallel Corpus Project: The Bible

This project is no longer active, though I am still always happy to receive feedback or pointers to useful resources.

Investigators

Summary

We are engaged in a project to acquire and annotate texts in order to create multilingual corpora for linguistic research, particularly computational linguistics. Religious texts such as the Bible are widely available, carefully translated, and appear in a huge variety of languages. We provide versions of the Bible consistently annotated according to the Corpus Encoding Standard. Resnik et al. (1999) discusses the project in detail, including a study on the vocabulary coverage of Biblical text with respect to dictionary and corpus resources, demonstrating the surprising extent to which it is relevant for research on everyday language.

Publications

Contact

Philip Resnik (resnik@umiacs.umd.edu)

Available Versions of the Bible

Biblical text is available for the following languages, annotated in conformance with the Corpus Encoding Standard. Note that as of this date, the seg tag has not yet been added to the official CES DTD; we are told by the CES coordinator that this will happen by August 1999. See header information in each file for pointers to the source for that version. Encoded versions for other languages will be added as they become available to us in forms that can be redistributed without violation of copyright. Please write to us with any errors you discover and any pointers to on-line biblical text for other languages that is available for redistribution.

Here is a key to the book codes we used (e.g. 1KI for "1 Kings", etc.).

Some files below may be temporarily unavailable.

Pointers to Related Projects


Return to Maryland Parallel Corpus Project