Philipp Koehn wrote:
> this is not correct - LM cost is in the future cost estimate.
> Obviously, this is a rather low probability, depending
> on if the language model was trained with open or
> closed vocabulary.
And also whether the word is unknown to the LM or not, yes? Typically
there are
Hi everyone,I would like to ask for some details on how Moses deals with the
handling of unknown words. As I read from the tutorial, unknown words are
copied verbatim to the output. However, it is not clear of how we deal with the
distortion limit while copying unknown words to the output.The si
Hi,
this is not correct - LM cost is in the future cost estimate.
Obviously, this is a rather low probability, depending
on if the language model was trained with open or
closed vocabulary.
The reordering of unknown words does cause often some
strange reordering, due to the fact that an unknown w
It seems like even if this is correctly implemented, unknown words
would be delayed until the edge of the window they are in, due to
trying to avoid paying the high LM cost until the last minute. LM cost
is not in the future cost, so hypotheses paying this LM cost should
lose to hypotheses delaying
make sure the scores of the unknown word (-100) is included in your
calculation of future cost.
On 09/08/2010 08:02, nghi...@comp.nus.edu.sg wrote:
> Hi everyone,
>
> I would like to ask for some details on how Moses deals with the handling
> of unknown words. As I read from the tutorial, unkno
> With the default setting of Moses (phrase-based, distance-based reordering
> ...), the handling of unknown words will be postponed (as we penalize them
> severely : -100) until the very end.
>
> Therefore, it is likely that some unknown words are dropped and won't
> appear in the output (due to
Hi everyone,
I would like to ask for some details on how Moses deals with the handling
of unknown words. As I read from the tutorial, unknown words are copied
verbatim to the output. However, it is not clear of how we deal with the
distortion limit while copying unknown words to the output.
The s