Skip to content

Latest commit

 

History

History
27 lines (17 loc) · 2.8 KB

README.md

File metadata and controls

27 lines (17 loc) · 2.8 KB

Lexical and Positional Differences (LePoD)

Lexical and Positional Differences (LePoD) score is used to quantify the surface differences between paraphrases.

A LePoD example
Comparing S1 and S2 with LePoD: hollow circles represent non-exact matched tokens, yielding a LeD score of LeD. Given the alignment illustrated above, the PoD score is PoD.

We first compute the pairwise Lexical Difference (LeD) based on the percentages of tokens that are not found in both outputs. Formally,

LeD

where S1 and S2 is a pair of sequences and S1\S2 indicates tokens appearing in S1 but not in S2.

We then compute the pairwise Positional Difference (PoD). (1) We segment the sentence pairs into the longest sequence of phrasal units that are consistent with the word alignments. Word alignments are obtained using the latest METEOR software, which supports stem, synonym and paraphrase matches in addition to exact matches. (2) We compute the maximum distortion within each segment. To do these, we first re-index N aligned words and calculate distortions as the position differences (i.e., index2-index1 in the figure). Then, we keep a running total of the distortion array (d1, d2, ...), and do segmentation p=(di, ..., dj)∈P whenever the accumulation is zero (i.e., Σ p=0). Now we can define

PoD

In extreme cases, when the first word in S1 is reordered to the last position in S2, PoD score approaches 1. When words are aligned without any reordering, each alignment constitutes a segment and PoD equals 0.

Usage Instructions

If you have already set up the multitask-ft-fsmt project, you are all set. Otherwise, please use setup.sh to install the necessary software.

An example of using LePoD is given in example.sh.

Citation

Please cite the following paper if you use LePoD in your own work: