This page documents some of the functionality of TER-Plus (TERp) and is very much a work in progress. If there are any errors of any information you need, you can contact the author at: Matthew Snover <snover@cs.umd.edu>.
This page was last updated: March 16th, 2009 at 5:00pm EDT.
These instructions are for use on a UNIX-like operating system.
Run the command:
bin/create_phrasedb <PHRASE_TABLE_TEXT> data/phrases.db
IMPORTANT
This step could take a while and will require several gigabytes of
diskspace, as the text version of the phrase table is converted to
a Berkley style database. The conversion tool also expects to have
1-3 GBs of memory available. This requirement can be reduced if
necessary in the bin/create_phrasedb script.
Running TER using the TERp Code
If you wish to run TER using the TERp codebase, you do not need to install WordNet or the phrase table. Results running TER using TERp may differ slightly from results using the TERcom java software due to changes in the search order of shifts.
Running:
bin/terp_ter -r <reference-file> -h <hypothesis-file-to-score>
will run TER against the reference file and hypothesis file
specified. The -a flag can be used if another file is to be used for
the reference length (as is done with scoring HTER). This will
generate lowercased TER scores after tokenizing the two files. The
reference-file and hypothesis-file should be in trans or sgml format
(both should be in the same format).
Alternatively, a parameter file specifying these and other options
can be passed to the bin/terp_ter script, as discussed below.
The bin/tercom scripts runs the original (non-TERp) version of TER, which is also bundled with this code. Input and ouput options for bin/tercom vary from those used for TERp. Although currently bundled with TERp, the TERcom code as bundled with TERp is not supported. TERcom as a standalone package is still supported however.
TERp-A is the version of TERp that was submitted to both the NIST Metrics MATR 2008 Challenge and to the Workshop on Statistical Machine Translation 2009. This version of TERp has been tuned to maximize correlation with human judgments of Adequacy (from the NIST Metrics MATR 2008 Challenge development data) at the segment level. TERp-A requires the installation of WordNet 3.0 and the use of the phrase table for phrasal substitutions.
Running:
bin/terpa -r <reference-file> -h <hypothesis-file-to-score>
run TERp-A against the reference file and hypothesis file
specified. The reference-file and hypothesis-file should be in trans
or sgml format (both should be in the same format). Alternatively, options can be based as a parameter file as discussed below.
Command-Line Usage Statement for TERp
java -jar terp.jar [-p parameter_file] -r <ref_file> -h <hyp_file> [-Nstcvm]
[-a alter_ref] [-b beam_width] [-o out_format -n out_pefix] [-w word_cost_file]
[-P phrase_table_db] [-W weight_file] [-d WordNet_dict_dir] [-S shift_stopword_list]
[ parameter_file1 parameter_file2 ... ]
---------------------------------------------------------------------------------
-r <ref-file> (required field if not specified in parameter file)
reference file in either TRANS or SGML format
-h <hyp-file> (required field if not specified in parameter file)
hypothesis file to score in either TRANS or SGML format
-p <parameter_file>
specifies parameters. Command line arguments after the -p will override values
in the parameter file
Command line arguments before the -p will be overriden by values in parameter file
parameter file for this run can be output by specifying 'param' as an output format
many parameters can only be set using a parameter
Any additional arguments to TERp will be treated as parameter files and evaluated
after other command line arguments.
-N
Normalize and Tokenize ref and hyp
-s
use case sensitivity
-c
cap ter at 100%
-t
use porter stemmer to determine shift equivilence
-v
use verbose output when running
-m
ignore missing hypothesis segments (useful when doing parallelization)
-a <alter-ref>
reference file, in either TRANS or SGML format, to use for calculating number of words in reference
-b <beam-width>
beam width to use for min edit distance calculations
-o <out_format>
set output formats: all,sum,pra,xml,ter,sum_nbest,nist,html,param,weights,counts
-n <out_prefix
set prefix for output files
-P <phrase_table_db>
directory that contains phrase table database
-W <weight_file>
file that contains edit weights.
-d <WordNet_dictionary_dir>
set the path to the WordNet Dictionary Directory (of the form /opt/WordNet-3.0/dict/)
-S <shift-stop-word-list>
specify a file that contains a list of words that cannot be shifted without a non-stop word
Usage for phrase table adjustment functions:
Valid phrasetable adjustment functions are:
NONE adjust function needs 0 parameters.
NONE function is: NEWCOST = %COST%
STD adjust function needs 4 parameters.
STD function is: NEWCOST = a + (b * %NED% * log_10(%COST%)) + (c * %NED% * %COST%) + (d * %NED%)
STDINV adjust function needs 4 parameters.
STDINV function is: NEWCOST = a + (b * %NED% * log_10(1.0 - %COST%)) + (c * %NED% * (1.0 - %COST%)) + (d * %NED%)
STDINVNONED adjust function needs 3 parameters.
STDINVNONED function is: NEWCOST = a + (b * log_10(1.0-%COST%)) + (c * (1.0 - %COST%))
STDNONED adjust function needs 3 parameters.
STDNONED function is: NEWCOST = a + (b * log_10(%COST%)) + (c * %COST%)
Within phrasetable adjustment functions:
%COST% is the original cost
%NED% is the TER cost between the two phrases
By default log refers to the natural log (base e).
log_10 refers to log base 10 in these equations.
Parameter Descriptions For TERp
All of the options to TERp can be specified in a parameter file. The parameter file is made up of lines of the form:
<PARAMETER NAME> : <PARAMETER VALUE>
The parameters used when running TERp can be output to a ".param" and
used to run TERp again with the same parameters. This is also a
useful way of modifying a run of TERp to use different parameters. A
list of all parameters and their meaning is described below. If it is
also possible to specify the parameter on the command line, its flag
is also given. Default values are those used when the jar file is run
without any parameter files, and does not correspond to the TERpA
measure used in the NIST Metrics MATR 2008 Challenge, or the TER
measure.
| Name | Type | Default Value | Command Line | Description |
| STRING | "NONE" | Set the function that is used to transform the probabilities or costs in the phrase table. Valid options are: "NONE", "STD", "STDINV", "STDINVNONED", and "STDNONED". Bith "STDINV" "STDINVNONED" are appropriate when using probabilities, so that a high probability phrase will result in a lower cost edit. The other functions are appropriate when using costs instead of probabilities as these will give high cost edits to phrases with high costs. The use of "STDINV" is standard in TERp, although "NONE" is used by default. For full equations on the adjustment functions see the adjustment function section. | ||
| LIST OF FLOATS | "" | DESC | ||
| float | Negative Infinity | DESC | ||
| float | Negative Infinity | DESC | ||
| INTEGER | ??? | -b <NUM> (sets beam-width equal to <NUM>) | Set the beam width used in the MinEditDistance computation in the TERp calculation. If this is set high enough then the algorithm will take N^2 time, and will compute the optimal MinEditDistance. Setting to lower values will cause massive speed increases though the resulting scores may be higher. | |
| BOOLEAN | FALSE | -c (sets to TRUE) | If the number of edits exceeds the number of words, report the TER score as 1.0, rather than a number above 1.0. | |
| BOOLEAN | FALSE | -s (sets to TRUE) | Run TERp in case sensitive mode so that words are not automatically downcased. If enabled, uppercase words may not match their lower case forms in WordNet, Stemming or Paraphrasing. Its use is not recommanded unless running TERp as TER. | |
| FLOAT | 1.0 | Sets the default cost of a deletion edit. This can be overwritten by a weight file. | ||
| FLOAT | 1.0 | Sets the default cost of an insertion edit. This can be overwritten by a weight file. | ||
| FLOAT | 0.0 | Sets the default cost of a match edit (when the words match exactly). This can be overwritten by a weight file. | ||
| FLOAT | 1.0 | Sets the default cost of a shift edit. This can be overwritten by a weight file. | ||
| FLOAT | 1.0 | Sets the default cost of a stem match edit (when two words share the same stem). This can be overwritten by a weight file. | ||
| FLOAT | 1.0 | Sets the default cost of a substitution edit (when two words do not match exactly). This can be overwritten by a weight file. | ||
| FLOAT | 1.0 | Sets the default cost of a synonym match edit (when two words are synonyms according to WordNet). This can be overwritten by a weight file. | ||
| BOOLEAN | ??? | Obsolete parameter. | ||
| BOOLEAN | FALSE | Replace all numbers with a generic number symbol. This feature is in beta, and its use is not recommended. | ||
| BOOLEAN | FALSE | -N (sets to TRUE) | Enables tokenization on the source and target input files and is recommended unless your input is pre-tokenized. This tokenization is meant to parallel the tokenization done in the mteval-0.6b.pl program, although slight differences exist. | |
| INTEGER | 50 | ??? | Sets the maximum distance (in words) that a shift of words may be. If set to 0 shifting will be disabled. | |
| INTEGER | 10 | ??? | Sets the maximum number of words that may be shifted as a single shift. If set to 0 shifting will be disabled. | |
| BOOLEAN | FALSE | ??? | Removes all punctuation from the hypothesis and reference before running TERp. | |
| BOOLEAN | FALSE | -t (sets to TRUE) | Enable Porter Stemming. If this is set to false then no stem matches will be considered. Porter stemming for languages other than English is not currently included. | |
| BOOLEAN | FALSE | Enable WordNet synonyms. If this is set to false than no synonym matches will be considered. The path to WordNet must also be set for this to work properly. | ||
| FILENAME | "" | If set this will load the listed file to use for word specific edit costs. This allows separate edits costs to be specified for different word classes, such as function words, names, numbers, punctuation, etc. All members of a word class must be specified. This feature is currently in BETA and its use is not recommended or supported. If set to the empty string, no word classes will be used. |
The costs associated with the on-disk phrase table which is used to compute phrase subtitutions can be transformed at run time according to several functions, listed below. The number of parameters associated with each function varies by function. The minimum and maximum value of these functions can be set using additional parameters, so that the edit costs of phrase substitutions can be limited to be above a certain value (0 typically as negative values can be problematic) or to cap the upper bound cost of a phrase substitution at a certain cost. Each function can be linearly decomposed to aid in optimization.
The valid adjustment functions are 'NONE', 'STD', 'STDINV', 'STDNONED', and 'STDINVNONED', with 'NONE' being the default function.
Within phrasetable adjustment functions:
The definitions of the adjustment functions are:
This section is still under development.