saeed smith writes:
>
> Thank you all (specially for the paper Chris mentioned).I agree with you
Barry. But as Germán said, when optimizer is not involved in experiments (e.g.
evaluating decoder modifications), the tool can be very useful. Am I missing
something?
I guess the point is that even
Thank you all (specially for the paper Chris mentioned).
I agree with you Barry. But as Germán said, when optimizer is not involved
in experiments (e.g. evaluating decoder modifications), the tool can be
very useful. Am I missing something?
Cheers,
SD
--
*NRC Center for Language*
On Thu, Jan
The question is, "significance" to what? Physics and other hard
sciences aren't the same as a social science with applied technology.
I think until someone can define a better significance test for human
authorship of both original content and translation, I agree with Barry.
It's better to ke
Hi Saeed
In my experience, significance tests are often badly applied or
interpreted, so I don't get good feelings when I read an MT paper *with*
significance tests.
I think having such a tool in Moses would make things worse. I don't
want to have to read/review papers which claim that "Moses
Indeed, I fully agree with the point about understanding the limits. In
fact, in some multi-reference corpora I have observed variations of more
than 10 BLEU points when computing inter-reference BLEU scores (i.e., one
reference against the other references). However, this issue is much
broader
If you're interested in statistical significant testing, you really
ought to read the Clark et al. (2011) paper
(http://www.cs.cmu.edu/~jhclark/pubs/significance.pdf). We showed that
the Koehn technique and related methods can indicate significance for
reasons that have little to do with the experi
That would be great!
On Thursday, January 24, 2013, Germán Sanchis Trilles wrote:
> Hi all,
>
> personally I have an implementation of Koehn's 2004 ACL paper about
> statistical sifgnificance tests for MT evaluation. It implements both
> "stand-alone confidence intervals" (sec.5, bootstrap resamp
Hi all,
personally I have an implementation of Koehn's 2004 ACL paper about
statistical sifgnificance tests for MT evaluation. It implements both
"stand-alone confidence intervals" (sec.5, bootstrap resampling) and
paired bootstrap resampling, if a baseline is given. Right now, it
computes co
Hi,
Amusingly enough, the parallel thread regarding MultEval answers your
question: https://github.com/jhclark/multeval .
Kenneth
On 01/24/13 11:15, Patrik Lambert wrote:
> Hi Saeed,
>
> I fully agree with you. I don't think that in Physics, for example, a
> paper without a reliable est
Hi Saeed,
I fully agree with you. I don't think that in Physics, for example, a
paper without a reliable estimation of the error on the measurements
would be publishable, nor would you see in a paper results with more
digits than the significant ones.
Having easy-to-use statistical significant
10 matches
Mail list logo