Hi all,
after playing with factored models I came to realize that train-model.perl
uses the target side of the monolingual data to estimate the conditional
probability distributions used at generation steps... besides it seems
impossible to set a larger monolingual data (once all we need for gener
Hi (Sorry this thread is a little old, I wasn't paying attention to this at
the time)
Thanks for the links to Chinese segmenters. However, for both of them (as
well as for some others that I found elsewhere), I didn't find the "reverse
desegmenter": the tool that will convert the output of moses b
Hi all,
I found a possible bug in tokenizer perl.
If I run:
echo "Don't put a space after the opening parenthesis" |
./tokenizer.perl -l en
The output is correct: Don 't put a space after the opening parenthesis
But if I run:
echo "Don't put a space after the opening parenthesis" |
./tokenizer.
Hi all,
I'd like to ask why Moses scripts - tokenizer.perl and detokenizer.perl
are based on different approaches. While tokenizer.perl is acquiring
rules for a specific language from special file stored in
nonbreaking_prefixes directory, detokenizer.perl has these rules
hardcoded inside and just