Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Qin Gao
Yes, MGIZA isn't really "incrementally training", it only initialize the model parameters with that trained previously, since it does not store sufficient statistics of the previous training. It will give bad performance if 1. You train only model 1 or 2. The incremental data or sub set is really

Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Miles Osborne
it is this: > Abby Levenberg, Chris Callison-Burch and Miles Osborne. Stream-based Translation Models for Statistical Machine Translation. NAACL, Los Angeles, USA, 2010. http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf Miles On 15 J

Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Miles Osborne
that isn't the expected answer here. i think the OP wants some kind of incremental (re) training. firstly: it is not really possible to guarantee that performance is not degraded when running from subsets up to the full set (compared with just running it on the full set). secondly, you may wish

Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Kenneth Heafield
Try using MGIZA: http://geek.kyloo.net/software/doku.php/mgiza:overview On 06/15/11 04:51, Prasanth K wrote: > Hello All, > > I am conducting a series of experiments to build translation systems > using Moses in which the corpus of the current experiment is a subset of > the corpora used in the p

[Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Prasanth K
Hello All, I am conducting a series of experiments to build translation systems using Moses in which the corpus of the current experiment is a subset of the corpora used in the previous experiment. I have started with the Europarl corpora and am likely to repeat this process about 20 times. Unless