Hi all,

I am trying to train a factored model, but Moses hang while performing the
step "(1.0.5) reducing factors to produce
/home/ksaso/Obeliks/test/prepared/tm_tagged/model/aligned.0,1.sl".

I have prepared the English corpus with the mxpost tool and manually
replaced the underscores with the pipe sign. It looks something like this:

streaming|VBG music|NN at|IN the|DT touch|NN of|IN a|DT button|NN
introducing|VBG SoundTouch|NNP ™|.
how|WRB it|PRP works|VBZ
SoundTouch|NNP ™|NNP Wi-Fi|NNP ®|NNP music|NN systems|NNS are|VBP much|RB
more|JJR than|IN just|RB speakers|NNS because|IN they|PRP connect|VBP
directly|RB to|TO the|DT Internet|NN over|IN your|PRP$ Wi-Fi|JJ network|NN
.|.
it|PRP makes|VBZ listening|VBG to|TO your|PRP$ favorite|JJ music|NN
easier|JJR .|.
all|DT around|IN your|PRP$ home|NN .|.
wirelessly|RB .|.
SoundTouch|NNP ™|NNP Wi-Fi|NNP ®|NNP music|NN systems|NNS are|VBP much|RB
more|JJR than|IN just|RB speakers|NNS because|IN they|PRP connect|VBP
directly|RB to|TO the|DT Internet|NN over|IN your|PRP$ Wi-Fi|JJ network|NN
so|IN you|PRP can|MD stream|NN Internet|NNP radio|NN and|CC your|PRP$
music|NN library|NN without|IN having|VBG go|NN to|TO your|PRP$ computer|NN
or|CC open|VB an|DT app|NN .|.
it|PRP makes|VBZ listening|VBG to|TO your|PRP$ favorite|JJ music|NN
quicker|NN and|CC easier|JJR .|.

The Slovenian side was prepared with a specialized tool and is converted
from the xml format. It looks like this>

pretakanje|pretakanje|S|Soset glasbe|glasba|S|Sozer z|z|D|Do
dotikom|dotik|S|Someo gumba|gumb|S|Somer
predstavljamo|predstavljati|G|Ggnspm vam|ti|Z|Zod-md
SoundTouch|Soundtouch|S|Slmei
delovanje|delovanje|S|Soset
glasbeni|glasben|P|Ppnmmi sistemi|sistem|S|Sommi
SoundTouch|Soundtouch|S|Slmei Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei
so|biti|G|Gp-stm-n veliko|veliko|R|Rsn več|več|R|Rsr kot|kot|V|Vd
samo|samo|L|L zvočniki|zvočnik|S|Sommi  ,|,|,  saj|saj|V|Vp
se|se|Z|Zp------k povežejo|povezati|G|Ggdstm neposredno|neposredno|R|Rsn
z|z|D|Do internetom|internet|S|Someo prek|prek|D|Dr omrežja|omrežje|S|Soser
Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei  .|.|.
poslušanje|poslušanje|S|Sosei priljubljene|priljubljen|P|Ppnzer
glasbe|glasba|S|Sozer je|biti|G|Gp-ste-n tako|tako|R|Rsn
enostavnejše|enostaven|P|Pppsei  .|.|.
povsod|povsod|R|Rsn v|v|D|Dm vašem|vaš|Z|Zsdmemm domu|dom|S|Somem  .|.|.
brezžično|brezžičen|P|Ppnsei  .|.|.
SoundTouch|Soundtouch|S|Slmei Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei
so|biti|G|Gp-stm-n veliko|veliko|R|Rsn več|več|R|Rsr kot|kot|V|Vd
samo|samo|L|L zvočniki|zvočnik|S|Sommi  ,|,|,  saj|saj|V|Vp
se|se|Z|Zp------k povežejo|povezati|G|Ggdstm neposredno|neposredno|R|Rsn
z|z|D|Do internetom|internet|S|Someo prek|prek|D|Dr omrežja|omrežje|S|Soser
Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei  ,|,|,  tako|tako|V|Vp da|da|V|Vd
lahko|lahko|R|Rsn internetni|interneten|P|Ppnmeid radio|radio|S|Sometn
in|in|V|Vp glasbeno|glasben|P|Ppnzet knjižnico|knjižnica|S|Sozet
pretakate|pretakati|G|Ggnsdm  ,|,|,  ne|ne|L|L da|da|V|Vd bi|biti|G|Gp-g
pristopili|pristopiti|G|Ggdd-mm k|k|D|Dd računalniku|računalnik|S|Somed
ali|ali|V|Vp odprli|odpreti|G|Ggdd-mm aplikacijo|aplikacija|S|Sozet  .|.|.
poslušanje|poslušanje|S|Sosei priljubljene|priljubljen|P|Ppnzer
glasbe|glasba|S|Sozer je|biti|G|Gp-ste-n hitrejše|hitro|R|Rsr in|in|V|Vp
enostavneje|enostavno|R|Rsr  .|.|.

I ran the following command:

~/mosesdecoder/scripts/training/train-model.perl --root-dir tm_tagged
--corpus ~/Obeliks/test/prepared/bose_tagged --f en --e sl --lm
4:3:/home/ksaso/Obeliks/test/prepared/lm/bose_tagged.blm.sl:0
--translation-factors 0-0,1 --external-bin-dir ~/mosesdecoder/tools --cores
16 &>training.out &

Both corpora have the same number of lines and the Slovenian language model
was created successfully. It looks something like this:

-3.480905    pretakanje|pretakanje|S|Soset    -0.16498125
-2.820011    glasbe|glasba|S|Sozer    -0.25699353
-2.1714172    z|z|D|Do    -0.24650323
-3.9205537    dotikom|dotik|S|Someo    -0.09213096
-3.63479    gumba|gumb|S|Somer    -0.11902898
-3.9205537    predstavljamo|predstavljati|G|Ggnspm    -0.09213096
-3.3675106    vam|ti|Z|Zod-md    -0.12703034

I am attaching the whole training.out file in case it helps. Like I said no
error message, Moses just hangs. I am assuming --lm 4:3:filename is OK,
since I have four factors? Are these parameters described in more detail
somewhere? I get the same result with different parameter values. Anyone
has an idea what I am doing wrong?

Thank you in advance and best regards,

Saso

Attachment: training.out
Description: Binary data

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to