[Moses-support] Generation Step based on Large monolingual data

2011-03-07 Thread Wilker Aziz
Hi all, after playing with factored models I came to realize that train-model.perl uses the target side of the monolingual data to estimate the conditional probability distributions used at generation steps... besides it seems impossible to set a larger monolingual data (once all we need for gener

Re: [Moses-support] Train Moses Engine for EN to ZH_CN

2011-03-07 Thread Raphael Payen
Hi (Sorry this thread is a little old, I wasn't paying attention to this at the time) Thanks for the links to Chinese segmenters. However, for both of them (as well as for some others that I found elsewhere), I didn't find the "reverse desegmenter": the tool that will convert the output of moses b

[Moses-support] tokenizer.perl - fall-back to English version

2011-03-07 Thread Tomas Hudik
Hi all, I found a possible bug in tokenizer perl. If I run: echo "Don't put a space after the opening parenthesis" | ./tokenizer.perl -l en The output is correct: Don 't put a space after the opening parenthesis But if I run: echo "Don't put a space after the opening parenthesis" | ./tokenizer.

[Moses-support] tokenizer.perl vs. detokenizer.perl

2011-03-07 Thread Tomas Hudik
Hi all, I'd like to ask why Moses scripts - tokenizer.perl and detokenizer.perl are based on different approaches. While tokenizer.perl is acquiring rules for a specific language from special file stored in nonbreaking_prefixes directory, detokenizer.perl has these rules hardcoded inside and just