TER-Plus (TERp) Version 0.1 Documentation

Translation-Edit-Rate plus pronounced "terp"
Matthew Snover, Nitin Madnani. Bonnie Dorr and Rich Schwartz.

This page documents some of the functionality of TER-Plus (TERp) and is very much a work in progress. If there are any errors of any information you need, you can contact the author at: Matthew Snover <snover@cs.umd.edu>.

This page was last updated: March 16th, 2009 at 5:00pm EDT.



Getting Started

Software Requirements

Installing TERp

These instructions are for use on a UNIX-like operating system.

  1. TERp requires Java version 1.5.0 or higher.
  2. Download and install WordNet version 3.0 from http://wordnet.princeton.edu/obtain
  3. Download the TERp code from: http://www.umiacs.umd.edu/~snover/terp/downloads/terp.v1.tgz
  4. Download the TERp phrase table from: http://www.umiacs.umd.edu/~snover/terp/downloads/terp-pt.v1.tgz
    This file is 122 MB and so is distributed separately from the main TERp code
  5. Unzip and untar the TERp code and TERp phrase table.
    tar xvfz terp.v1.tgz
    tar xvfz terp-pt.v1.tgz
  6. Several shell scripts are provided to simplify the process of running TERp. To setup these scripts run:
    bin/setup_bin.sh <PATH_TO_TERP> <PATH_TO_JAVA> <PATH_TO_WORDNET>
    where:
    This step will create the following scripts:
    and create the parameter file data/data_loc.param
  7. Generate the phrase table database from the paraphrase phrase table text file downloaded in step 4 and untarred in step 5.

    Run the command:
    bin/create_phrasedb <PHRASE_TABLE_TEXT> data/phrases.db

    IMPORTANT
    This step could take a while and will require several gigabytes of diskspace, as the text version of the phrase table is converted to a Berkley style database. The conversion tool also expects to have 1-3 GBs of memory available. This requirement can be reduced if necessary in the bin/create_phrasedb script.

Running TER using the TERp Code

If you wish to run TER using the TERp codebase, you do not need to install WordNet or the phrase table. Results running TER using TERp may differ slightly from results using the TERcom java software due to changes in the search order of shifts.

Running:
bin/terp_ter -r <reference-file> -h <hypothesis-file-to-score>
will run TER against the reference file and hypothesis file specified. The -a flag can be used if another file is to be used for the reference length (as is done with scoring HTER). This will generate lowercased TER scores after tokenizing the two files. The reference-file and hypothesis-file should be in trans or sgml format (both should be in the same format).
Alternatively, a parameter file specifying these and other options can be passed to the bin/terp_ter script, as discussed below.


The bin/tercom scripts runs the original (non-TERp) version of TER, which is also bundled with this code. Input and ouput options for bin/tercom vary from those used for TERp. Although currently bundled with TERp, the TERcom code as bundled with TERp is not supported. TERcom as a standalone package is still supported however.

Running TERp-A

TERp-A is the version of TERp that was submitted to both the NIST Metrics MATR 2008 Challenge and to the Workshop on Statistical Machine Translation 2009. This version of TERp has been tuned to maximize correlation with human judgments of Adequacy (from the NIST Metrics MATR 2008 Challenge development data) at the segment level. TERp-A requires the installation of WordNet 3.0 and the use of the phrase table for phrasal substitutions.

Running:
bin/terpa -r <reference-file> -h <hypothesis-file-to-score>
run TERp-A against the reference file and hypothesis file specified. The reference-file and hypothesis-file should be in trans or sgml format (both should be in the same format). Alternatively, options can be based as a parameter file as discussed below.


More Detailed Control

Command-Line Usage Statement for TERp

java -jar terp.jar [-p parameter_file] -r <ref_file> -h <hyp_file> [-Nstcvm] 
 [-a alter_ref] [-b beam_width] [-o out_format -n out_pefix] [-w word_cost_file]
 [-P phrase_table_db] [-W weight_file] [-d WordNet_dict_dir] [-S shift_stopword_list]
 [ parameter_file1 parameter_file2 ... ]
 ---------------------------------------------------------------------------------
  -r <ref-file> (required field if not specified in parameter file)
    reference file in either TRANS or SGML format
  -h <hyp-file> (required field if not specified in parameter file)
    hypothesis file to score in either TRANS or SGML format
  -p <parameter_file>
    specifies parameters.  Command line arguments after the -p will override values
    in the parameter file
    Command line arguments before the -p will be overriden by values in parameter file
    parameter file for this run can be output by specifying 'param' as an output format
    many parameters can only be set using a parameter
    Any additional arguments to TERp will be treated as parameter files and evaluated
    after other command line arguments.
  -N
    Normalize and Tokenize ref and hyp
  -s
    use case sensitivity
  -c
    cap ter at 100%
  -t
    use porter stemmer to determine shift equivilence
  -v
    use verbose output when running
  -m
    ignore missing hypothesis segments (useful when doing parallelization)
  -a <alter-ref>
    reference file, in either TRANS or SGML format, to use for calculating number of words in reference
  -b <beam-width>
    beam width to use for min edit distance calculations
  -o <out_format>
    set output formats:  all,sum,pra,xml,ter,sum_nbest,nist,html,param,weights,counts
  -n <out_prefix
    set prefix for output files
  -P <phrase_table_db>
    directory that contains phrase table database
  -W <weight_file>
    file that contains edit weights.
  -d <WordNet_dictionary_dir>
    set the path to the WordNet Dictionary Directory (of the form /opt/WordNet-3.0/dict/)
  -S <shift-stop-word-list>
    specify a file that contains a list of words that cannot be shifted without a non-stop word

Usage for phrase table adjustment functions:
Valid phrasetable adjustment functions are:
  NONE adjust function needs 0 parameters.
    NONE function is: NEWCOST = %COST%
  STD adjust function needs 4 parameters.
    STD function is: NEWCOST = a + (b * %NED% * log_10(%COST%)) + (c * %NED% * %COST%) + (d * %NED%)
  STDINV adjust function needs 4 parameters.
    STDINV function is: NEWCOST = a + (b * %NED% * log_10(1.0 - %COST%)) + (c * %NED% * (1.0 - %COST%)) + (d * %NED%)
  STDINVNONED adjust function needs 3 parameters.
    STDINVNONED function is: NEWCOST = a + (b * log_10(1.0-%COST%)) + (c * (1.0 - %COST%))
  STDNONED adjust function needs 3 parameters.
    STDNONED function is: NEWCOST = a + (b * log_10(%COST%)) + (c * %COST%)

Within phrasetable adjustment functions:
  %COST% is the original cost
  %NED% is the TER cost between the two phrases
  By default log refers to the natural log (base e).
    log_10 refers to log base 10 in these equations.

Parameter Descriptions For TERp

All of the options to TERp can be specified in a parameter file. The parameter file is made up of lines of the form:
<PARAMETER NAME> : <PARAMETER VALUE>
The parameters used when running TERp can be output to a ".param" and used to run TERp again with the same parameters. This is also a useful way of modifying a run of TERp to use different parameters. A list of all parameters and their meaning is described below. If it is also possible to specify the parameter on the command line, its flag is also given. Default values are those used when the jar file is run without any parameter files, and does not correspond to the TERpA measure used in the NIST Metrics MATR 2008 Challenge, or the TER measure.
Name Type Default Value Command Line Description
Adjust Phrase Table Func (string) STRING "NONE" Set the function that is used to transform the probabilities or costs in the phrase table. Valid options are: "NONE", "STD", "STDINV", "STDINVNONED", and "STDNONED". Bith "STDINV" "STDINVNONED" are appropriate when using probabilities, so that a high probability phrase will result in a lower cost edit. The other functions are appropriate when using costs instead of probabilities as these will give high cost edits to phrases with high costs. The use of "STDINV" is standard in TERp, although "NONE" is used by default. For full equations on the adjustment functions see the adjustment function section.
Adjust Phrase Table Params (float list) LIST OF FLOATS "" DESC
Adjust Phrase Table Min (float) float Negative Infinity DESC
Adjust Phrase Table Max (float) float Negative Infinity DESC
Beam Width (integer) INTEGER ??? -b <NUM> (sets beam-width equal to <NUM>) Set the beam width used in the MinEditDistance computation in the TERp calculation. If this is set high enough then the algorithm will take N^2 time, and will compute the optimal MinEditDistance. Setting to lower values will cause massive speed increases though the resulting scores may be higher.
Cap Maximum TER (boolean) BOOLEAN FALSE -c (sets to TRUE) If the number of edits exceeds the number of words, report the TER score as 1.0, rather than a number above 1.0.
Case Sensitive (boolean) BOOLEAN FALSE -s (sets to TRUE) Run TERp in case sensitive mode so that words are not automatically downcased. If enabled, uppercase words may not match their lower case forms in WordNet, Stemming or Paraphrasing. Its use is not recommanded unless running TERp as TER.
Default Deletion Cost (float) FLOAT 1.0 Sets the default cost of a deletion edit. This can be overwritten by a weight file.
Default Insertion Cost (float) FLOAT 1.0 Sets the default cost of an insertion edit. This can be overwritten by a weight file.
Default Match Cost (float) FLOAT 0.0 Sets the default cost of a match edit (when the words match exactly). This can be overwritten by a weight file.
Default Shift Cost (float) FLOAT 1.0 Sets the default cost of a shift edit. This can be overwritten by a weight file.
Default Stem Cost (float) FLOAT 1.0 Sets the default cost of a stem match edit (when two words share the same stem). This can be overwritten by a weight file.
Default Substitution Cost (float) FLOAT 1.0 Sets the default cost of a substitution edit (when two words do not match exactly). This can be overwritten by a weight file.
Default Synonym Cost (float) FLOAT 1.0 Sets the default cost of a synonym match edit (when two words are synonyms according to WordNet). This can be overwritten by a weight file.
Filter Phrase Table (boolean) BOOLEAN ??? Obsolete parameter.
Generalize Numbers (boolean) BOOLEAN FALSE Replace all numbers with a generic number symbol. This feature is in beta, and its use is not recommended.
Normalize (boolean) BOOLEAN FALSE -N (sets to TRUE) Enables tokenization on the source and target input files and is recommended unless your input is pre-tokenized. This tokenization is meant to parallel the tokenization done in the mteval-0.6b.pl program, although slight differences exist.
Shift Distance (integer) INTEGER 50 ??? Sets the maximum distance (in words) that a shift of words may be. If set to 0 shifting will be disabled.
Shift Size (integer) INTEGER 10 ??? Sets the maximum number of words that may be shifted as a single shift. If set to 0 shifting will be disabled.
Strip Punctuation (boolean) BOOLEAN FALSE ??? Removes all punctuation from the hypothesis and reference before running TERp.
Use Porter Stemming (boolean) BOOLEAN FALSE -t (sets to TRUE) Enable Porter Stemming. If this is set to false then no stem matches will be considered. Porter stemming for languages other than English is not currently included.
Use WordNet Synonymy (boolean) BOOLEAN FALSE Enable WordNet synonyms. If this is set to false than no synonym matches will be considered. The path to WordNet must also be set for this to work properly.
Word Class File (filename) FILENAME "" If set this will load the listed file to use for word specific edit costs. This allows separate edits costs to be specified for different word classes, such as function words, names, numbers, punctuation, etc. All members of a word class must be specified. This feature is currently in BETA and its use is not recommended or supported. If set to the empty string, no word classes will be used.

Adjustment Functions

The costs associated with the on-disk phrase table which is used to compute phrase subtitutions can be transformed at run time according to several functions, listed below. The number of parameters associated with each function varies by function. The minimum and maximum value of these functions can be set using additional parameters, so that the edit costs of phrase substitutions can be limited to be above a certain value (0 typically as negative values can be problematic) or to cap the upper bound cost of a phrase substitution at a certain cost. Each function can be linearly decomposed to aid in optimization.

The valid adjustment functions are 'NONE', 'STD', 'STDINV', 'STDNONED', and 'STDINVNONED', with 'NONE' being the default function.

Within phrasetable adjustment functions:

The definitions of the adjustment functions are:

Optimization of Edit Costs

This section is still under development.