Improvements in 37 BLEU points over the default behaviour was not enough to
show that there are problems with the default?
James
From: Raphael Payen raphael.pa...@gmail.com
Sent: Sunday, June 21, 2015 5:29 PM
To: Read, James C
Cc: moses-support@mit.edu
I think it is you who seems to have missed the point.
If the default behaviour is giving BLEU scores considerably lower than the BLEU
score obtained from merely selecting the most likely translation of each phrase
then there is evidently something very wrong with the default behaviour.
If you
Hi James,
Irrespective of the fact that you need to tune the weights of the
log-linear model:
Let me provide more references in order to shed light on how well
established simple pruning techniques are in our field as well as in
related fields (namely, automatic speech recognition).
This list
what *i* would do is tune my systems.
~amittai
On 6/24/15 09:15, Read, James C wrote:
Thank you for such an invitation. Let's see. Given the choice of
a) reading through thousands of lines of code trying to figure out why the
default behaviour performs considerably worse than merely
As the title of this thread makes clear the purpose of reporting the bug was
not to invite a discussion about conclusions made in my draft paper. Clearly a
community that builds its career around research in SMT is unlikely to agree
with those kinds of conclusions. The purpose was to report the
So you still think it's fine that the default would perform at 37 BLEU points
less than just selecting the most likely translation of each phrase?
You know I think I would have to try really hard to design a system that
performed so poorly.
James
Thank you for such an invitation. Let's see. Given the choice of
a) reading through thousands of lines of code trying to figure out why the
default behaviour performs considerably worse than merely selecting the most
likely translation of each phrase or
b) spending much less time implementing
On Wed, Jun 24, 2015 at 8:11 AM, Read, James C jcr...@essex.ac.uk wrote:
Other than that it seems painfully clear that the point I meant to make
has not been understood entirely. If the default behaviour produces BLEU
scores considerably lower than merely selecting the most likely translation
It would be really wonderful if Moses had an out-of-the-box example that ran
without further tuning. Would you be willing to create that for us? We would
greatly appreciate it.
The open source community exists on a somewhat different model than the
commercial software community. In the
On Jun 24, 2015, at 10:47 , Read, James C jcr...@essex.ac.uk wrote:
So you still think it's fine that the default would perform at 37 BLEU points
less than just selecting the most likely translation of each phrase?
Yes, I'm pretty sure we all think that's fine, because one of the steps of
Thank you for reading very careful the draft paper I provided a link to and
noticing that the Johnson paper is duly cited there. Given that you had already
noticed this I shall not proceed to explain the blinding obvious differences
between my very simple filter and their filter based on
On Wed, Jun 24, 2015 at 9:05 AM, Read, James C jcr...@essex.ac.uk wrote:
As the title of this thread makes clear the purpose of reporting the bug
was not to invite a discussion about conclusions made in my draft paper.
Clearly a community that builds its career around research in SMT is
Please allow me to give a synthesis of my understanding of your response:
a) we understand that out of the box Moses performs notably less well than
merely selecting the most likely translation for each phrase
b) we don't see this as a problem because for years we've been applying a
different
James,
(1) Did you ever look at the model scores? The decoder's job is to find the
hypotheses with the highest model score and if your baseline system finds
translations with higher model scores than your filtered system then there is
no bug in Moses.
(2) You should stop talking about BLEU
John,
to my knowledge, you still have not reported BLEU scores for the following
experiment:
The moses.ini in your unfiltered translation experiment should assign
weights of 0 0 0 1 to the TM features.
(requested by Matt on June 17).
Would you please run this experiment and report the results?
Hi,
it is beneficial if the tuning set
- is representative of what you want to translate
- is a relatively literal translation, so the MT system has a chance
to match the reference
-phi
On Wed, Jun 24, 2015 at 12:52 PM, Dingyuan Wang abcdoyle...@gmail.com wrote:
Dear all,
I have collected a
16 matches
Mail list logo