Re: [Moses-support] Factored model configuration using stems and POS

2016-08-02 Thread Gmehlin Floran
os" Do you first need to translate the left-hand side factors ? e.g. "word -> word,stem -> stem" or "word+stem -> word+stem". Thank you for your help ! From: Hieu Hoang [hieuho...@gmail.com] Sent: 01 August 2016 20:50 To: Gmehlin F

[Moses-support] Factored model configuration using stems and POS

2016-07-27 Thread Gmehlin Floran
Hi, I am trying to build a factored translation model using stems and part-of-speech for a week now and I cannot have satisfying results. This probably comes from my factor configuration as I probably do not fully understand how it work (I am following the paper Factored Translation Model from

[Moses-support] Decoder Died during TUNING:Tune phase (Factored EMS)

2016-07-22 Thread Gmehlin Floran
Hi, The decoder dies when reaching the TUNING:tune phase of the EMS and I have no idea why it does so. I'm running a factored model with 2 factors as input and 2 factors as ouput. The following is written in the file TUNING_tune.8.STDERR : Using SCRIPTS_ROOTDIR: /local/moses/mosesdecoder/scr

[Moses-support] Moses EMS Config file for factored training (post+stem) using TreeTagger

2016-07-19 Thread Gmehlin Floran
Hi, I am not sure whether I have to provide the files (DE & EN) with the factors (e.g. word0|stem0|pos0 ...) to the EMS or if it builds it up itself from the original files and the tagging tool ? I am using TreeTagger to tag POS and Stems in my original corporas. However, I am not not really s

[Moses-support] TreeTagger and format with pipes for Factored Model in moses

2016-07-18 Thread Gmehlin Floran
Hi, I would like to try a Factored Training on my corpus. I see that with TreeTagger (from uni-muenchen.de) we can parse a text file so that it outputs the POS. However, I haven't been able to produce the desired format for Moses (with POS and Lemmas). There are a bunch of scripts in the scrip

[Moses-support] German compound splitter stuck

2016-07-07 Thread Gmehlin Floran
Hi, I'm using the script for compound splitting (/mosesdecoder/scripts/generic/compound-splitter.perl) on the german side of my parallel corpora. The corpora contains around 4M. sentences and may contains few english sentences in it (as I just noticed). The scripts is actually running for 14h

[Moses-support] snt2cooc option fails the training

2016-07-05 Thread Gmehlin Floran
Hi, It seems that whenever I use the option "-snt2cooc snt2cooc.pl" the training fails (see below for error log). When I try on the same corpora (rather small because of memory limitation) it works. Does anyone have a clue about this ? Reading more sentence pairs into memory ... [sent:340]

[Moses-support] Truecaser vs. Lowercase

2016-07-04 Thread Gmehlin Floran
Hi, I see from this page (http://www.statmt.org/moses/?n=Moses.Baseline) that we should train a truecaser before training the translation model. However, in the page "Preparing training data" (http://www.statmt.org/moses/?n=FactoredTraining.PrepareTraining), it is said to lowercase the data an