Re: [Moses-support] statistical significance tests

2013-01-24 Thread Rico Sennrich
saeed smith writes: > > Thank you all (specially for the paper Chris mentioned).I agree with you Barry. But as Germán said, when optimizer is not involved in experiments (e.g. evaluating decoder modifications), the tool can be very useful. Am I missing something? I guess the point is that even

Re: [Moses-support] statistical significance tests

2013-01-24 Thread saeed smith
Thank you all (specially for the paper Chris mentioned). I agree with you Barry. But as Germán said, when optimizer is not involved in experiments (e.g. evaluating decoder modifications), the tool can be very useful. Am I missing something? Cheers, SD -- *NRC Center for Language* On Thu, Jan

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Tom Hoar
The question is, "significance" to what? Physics and other hard sciences aren't the same as a social science with applied technology. I think until someone can define a better significance test for human authorship of both original content and translation, I agree with Barry. It's better to ke

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Barry Haddow
Hi Saeed In my experience, significance tests are often badly applied or interpreted, so I don't get good feelings when I read an MT paper *with* significance tests. I think having such a tool in Moses would make things worse. I don't want to have to read/review papers which claim that "Moses

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Germán Sanchis Trilles
Indeed, I fully agree with the point about understanding the limits. In fact, in some multi-reference corpora I have observed variations of more than 10 BLEU points when computing inter-reference BLEU scores (i.e., one reference against the other references). However, this issue is much broader

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Chris Dyer
If you're interested in statistical significant testing, you really ought to read the Clark et al. (2011) paper (http://www.cs.cmu.edu/~jhclark/pubs/significance.pdf). We showed that the Koehn technique and related methods can indicate significance for reasons that have little to do with the experi

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Lane Schwartz
That would be great! On Thursday, January 24, 2013, Germán Sanchis Trilles wrote: > Hi all, > > personally I have an implementation of Koehn's 2004 ACL paper about > statistical sifgnificance tests for MT evaluation. It implements both > "stand-alone confidence intervals" (sec.5, bootstrap resamp

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Germán Sanchis Trilles
Hi all, personally I have an implementation of Koehn's 2004 ACL paper about statistical sifgnificance tests for MT evaluation. It implements both "stand-alone confidence intervals" (sec.5, bootstrap resampling) and paired bootstrap resampling, if a baseline is given. Right now, it computes co

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Kenneth Heafield
Hi, Amusingly enough, the parallel thread regarding MultEval answers your question: https://github.com/jhclark/multeval . Kenneth On 01/24/13 11:15, Patrik Lambert wrote: > Hi Saeed, > > I fully agree with you. I don't think that in Physics, for example, a > paper without a reliable est

Re: [Moses-support] statistical significance tests

2013-01-24 Thread Patrik Lambert
Hi Saeed, I fully agree with you. I don't think that in Physics, for example, a paper without a reliable estimation of the error on the measurements would be publishable, nor would you see in a paper results with more digits than the significant ones. Having easy-to-use statistical significant