Re: [Moses-support] METEOR: difference between ranking task and other tasks

2014-11-26 Thread Marcin Junczys-Dowmunt
Thanks, that's a very useful answer. I figured something similar, but I was curious how come these huge differences between the methods are never reported anywhere. Even in your paper they are just a few percent. Also, could it be that the default METEOR setting is slighlty overfitting to the

Re: [Moses-support] METEOR: difference between ranking task and other tasks

2014-11-26 Thread Michael Denkowski
Hi Marcin, Meteor scores can vary widely across tasks due to the training data and goal. The default ranking task tries to replicate WMT rankings, so the absolute scores are not as important as the relative scores between systems. The adequacy task tries to fit Meteor scores to numeric adequacy

[Moses-support] METEOR: difference between ranking task and other tasks

2014-11-26 Thread Marcin Junczys-Dowmunt
Hi, A question concerning METEOR, maybe someone has some experience. I am seeing huge differences between values for English with the defauly task "ranking" and any other of the tasks (e.g. "adq"). up to 30-40 points. Is this normal? In the literature I only ever see marginal differences of ma