Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

2013-01-24 Thread Tan, Jun
ed.ac.uk] Sent: Thursday, 24 January 2013 5:44 PM To: Tan, Jun Cc: moses-support@mit.edu Subject: Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script Hi Jun mert-moses.pl is not an evaluation script, it's for tuning the MT en

Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

2013-01-24 Thread Jonathan Clark
Hi Jun, all: I just released a new version of MultEval (V0.5.1) that does not give the strange NaN's, but instead prints a warning message telling you that you're using a single optimizer run so that no value can be calculated and that any conclusions you draw from these numbers may be unreliable

Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

2013-01-24 Thread Rico Sennrich
Barry Haddow writes: > The NaNs in the MultiEval output are a bit strange. I'm not familiar > with this tool, but Moses contains multi-bleu.pl (in scripts/generic) > which you can also use to calculate Bleu, > > cheers - Barry s_opt is the variance of different optimizer runs. MultEval is int

Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

2013-01-24 Thread Barry Haddow
Hi Jun mert-moses.pl is not an evaluation script, it's for tuning the MT engine. It will report bleu scores obtained during tuning, but these are on the development set. The scores you're showing using MultiEval are (I hope!) on the test set, which would make them different. It's quite a big d

[Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

2013-01-23 Thread Tan, Jun
Hello all, I have created an English-Chinese MT engine via Moses. I’m doing the translation quality evaluation regard this engine. I have an evaluation report created by MultiEval tool about 1000 sentences. I found the BELU score is much lower than the score generated by the mert-moses.pl scri