moses-parallel.pl and mert-moses.pl were changed .
Now they works well with lattice inputs, too.
Notice that you do NOT need to specify
-decoder-flags -inputtype 2
the parameter
--inputtype 2
of mert-moses.pl is passed to the decoder automatically.
best,
Nicola
Hello,
I think I know the mistake,
when scanning the config file, the mer script mert-moses.pl search for
file phrase-table file
but if phrasetable is in binary format, the file name added suffix
.binphr.idx,.. etc, therefore mert output phrasetable not exists,
I just modified that part in
-- apologies if this is a duplicate, I'm having trouble posting to the
list --
I just built (as an exercise) a fr-en model based on the first 500K
sentences of Europarl.
It appears the generated model is just way too big to load on my Windows
machine: while trying to load, I see memory/swap
Hi all.
My mind suddenly stumbled across a wild doubt...
Does -dl 0 mean monotone, or does it rather mean distortion-limit =
inf (as it did in Pharaoh)?
In the second case, how would you specify a monotonicity constraint? -dl 1?
Best regards,
Germán Sanchis
Dear Christian,
As the binary phrase table (PT) is generated from the textual one,
we assumed that the latter exists,
so the check was done only on the textual PT.
If I needed to save space I deleted the textual PT (and not the binaries)
and recreated an almost empty PT with the same name
I've seen mention of filtering the model before translation. Many of the phrase
in your TM may not even exist in the document you wish to translate. Why bother
loading them into memory?
Quoting Hubert Crépy [EMAIL PROTECTED]:
-- apologies if this is a duplicate, I'm having trouble posting to
J C Read a écrit :
I've seen mention of filtering the model before translation.
Thanks for the hint, it did point me in the right direction.
I found the moses/scripts/training/filter-model-given-input.pl tool.
Given an a priori known input, it does reduce the model to a manageable
size.
Hello all,
I'm currently trying to train Moses on aligned subtitles obtained from
the opus corpus website. The files have been cleaned and formatted in a
similar way to the standard Europarl files.
There are a series of NAN errors after Giza begins the HMM stage of
training. The corpus
Ola german,
dl = 0 means monotone, and negative dl means no restriction on re-ordering
U can see the code for it in
Manager.cpp::line 143
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Germán Sanchis Trilles
Sent: 28 February 2008 15:14
To:
Hi
When I found GIZA giving me nan errors it was due to a mismatch in the C++
standard libraries. I had compiled GIZA on Redhat FC5 but I was running on
FC6. Once I matched the compile and run platforms, the problem went away,
regards
Barry
On Thursday 28 February 2008 17:34:23 Wilson,
Hi, Wilson,
As I mentioned, GIZA++ may have a bug on HMM training stage, it will add
some random number to count table, and maybe it is the reason. You may
check the archive of the mailing list for the description of the bug,
also, you can simply comment out the lines marked with //***//
I haven't looked into what's causing the particular problem on this
corpus, but another known problem with the GIZA HMM model is that it
doesn't do a fairly standard kind of normalization in the
forward-backward training, which causes underflow errors in some
sentences (especially quite long
Sorry I am not sure the bug I report is directly related to the issue,
because the bug I mentioned is kind of random (read violation on some
random address) and can hardly be reproduced on different machines. What
we can do is fixing it and try again. Also, I will look into the problem
you
13 matches
Mail list logo