Hi,
> First, I wonder how I can use the much bigger monolingual corpus for
> training the generation step. Where in the config or meta files can I
> specify data to be used?
Currently, EMS is not set up to allow this option.
I suggest to run one experiment all the way through with only the paral
Hi Sonja,
> I will also repeat my other two questions in case someone could answer them:
>
>>> Second, since my data is already tokenised, parsed, factorised and
>>> lowercased, how can I tell EMS to skip those steps and, if possible,
>>> evaluate the result without truecasing, detokenising and wr
2010/9/7 Hieu Hoang :
> you can create an extra language model from the monolingual corpus.
yes, i combine the target language side of the parallel corpus and the
monolingual corpus for creating language models. what i wonder is how
i can do the when creating generation models. on the page
http://
hi sonja
you can create an extra language model from the monolingual corpus.
The factored model is really word-level so it can't use parse structures
from a parser. You can try technique such as
http://www.mt-archive.info/WMT-2010-Bisazza.pdf
if you want to use constituency structures.
If the
Hi!
I have a 2,5 million word parallel corpus and a 50 million word
monolingual target language corpus, both deeply parsed using a
Constraint Grammar parser. I am using the EMS to try different
factored models.
First, I wonder how I can use the much bigger monolingual corpus for
training the gene