Hi,
you could just run word alignment on the 50,000 lines, but you will get
better performance if you somehow leverage the baseline parallel corpus
for word alignment.
One way is incremental GIZA++, the other is re-run everything.
You could also try some middle ground of including some of the
Can I use incremental GIZA++ for the new lines, even though I didn't use it
for the baseline? (does mgiza give me everything inc-giza needs?)
If not, I like the idea of just running word alignment on the new lines.
Would I need to update any files besides *.A3.final.gz for steps 3+ to run
Hi,
you do not need incremental GIZA++ for the baseline run, but you need
to run it with the HMM alignment models as final step and store intermediate
files (which you likely have not done).
Here some information:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc33
-phi
On Sat, Jul
Hello,
I have a large phrase-based translation system. Alignment was done with
mgiza, and took a few weeks. I now have a small amount of extremely
relevant new bitext (~50,000 lines) that I would like to use to augment the
model, without having to retrain everything. The new data contains many