Re: [Moses-support] how to compile with nplm library

2014-12-29 Thread Nikolay Bogoychev
Hey, First you need to checkout and compile this fork of nplm: https://github.com/rsennrich/nplm Then you need to compile moses with nplm switch: ./bjam --with-nplm=path/to/nplm Then you can see how to use it here http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc31 On 30

[Moses-support] how to compile with nplm library

2014-12-29 Thread Xiaoqiang Feng
Hi, nplm is one toolkit of neural probabilistic language model. This toolkit can be used in Moses for language model and bilingual LM(neural network joint model, ACL 2014). These two parts have been updated in github mosesdecoder. If you want to use nplm in Moses, you have to compile Moses by lin

Re: [Moses-support] Moses tokenizer treats combining diaeresis inconsistently

2014-12-29 Thread Kenneth Heafield
So to summarize: The main issue is that the Moses tokenizer operates at the character rather than grapheme level on some versions of perl, treating combining characters (which are arguably parts of words in many cases) as non-alphanumeric and splitting them off. Older versions of perl appear to b

Re: [Moses-support] Moses tokenizer treats combining diaeresis inconsistently

2014-12-29 Thread Tom Hoar
Japanese is another language that suffers from standard Unicode NFKC because the normalization applies changes that can not be reversed. On 12/30/2014 04:40 AM, John D Burger wrote: >> This is also a reason to turn Unicode normalization on. If the >> tokenizer did NFKC at the beginning, then t

Re: [Moses-support] "'" in tokenization

2014-12-29 Thread Tom Hoar
The escaping is necessary because Moses reserves these characters for other uses. When corpora are consistently prepared, the escaping has no effect on translation results. It looks like you have not prepared your corpora consistently. Note my results ('s) are different from yours (' s): us

Re: [Moses-support] Moses tokenizer treats combining diaeresis inconsistently

2014-12-29 Thread John D Burger
> This is also a reason to turn Unicode normalization on. If the > tokenizer did NFKC at the beginning, then the problem would go away. If I understand the situation correctly, this would only fix this particular example and a few others like it. There are many base+combining grapheme clusters

[Moses-support] Moses tokenizer treats combining diaeresis inconsistently

2014-12-29 Thread Kenneth Heafield
Dear Moses, The attached file, taken from line 2345157 of http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.en.shuffled.gz , tokenizes differently on different machines. I'm running tokenizer.perl from head (481a07dc) with this perl: This is perl 5, version 18

[Moses-support] "'" in tokenization

2014-12-29 Thread Ihab Ramadan
Dears, When I make tokenization on files it replaces the apostrophes with “'” which make sense, but in the other side it crashes the meaning and the order of the words at all, for example: Sentence before tokenization : Src : keep your notification's payload under 5 kb. Trg: اجعل حمولة الإعل

[Moses-support] training and tuning for POS or CCG

2014-12-29 Thread Eng HAR
I have Arabic into English translation ... Factored Model . My Question is: Have i to add POS for the source and target or just target that i want to translate to (through training and tuning) ? In case i have to add for both , how can i add supertaged (CCG) to the Arabic language cause there is