Hi,
I am using train-model.perl with
--extract-options=--IncludeSentenceId
and it seems that the sentence id is somehow getting into the phrase
table as a count and later used for phrase translation weight
calculation, for instance the extract (last column is the Id):
#c the compound or
I was planning to use it for a custom feature function later.
W dniu 23.07.2014 13:11, Hieu Hoang pisze:
i can change it so that the sentence id is put into a key-value field
in the last column.
what is the sentence id used for? is it just for debugging purposes?
On 23 July 2014 11:36,
Key-value format would actually be fine.
W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
I was planning to use it for a custom feature function later.
W dniu 23.07.2014 13:11, Hieu Hoang pisze:
i can change it so that the sentence id is put into a key-value field
in the last column.
Hi,
the sentence ID is being used for the domain indicator features.
If you run phrase-extract's score with specifying a domain file,
it then it uses the sentence IDs to find out which domain the
phrase pair was found in.
This is a standard features in Edinburgh's phrase-based system
for the
ah ok.
I thought it was just for debugging. I'm not gonna change it since it's
gonna involve months of debugging.
Ideally, the extract format should be fixed like the phrase-table, with
the last column being key-value pairs. Also, way the key-value pairs are
processed should be automatic
So, how come this is not damaging the Edinburgh system?
W dniu 23.07.2014 17:32, Hieu Hoang pisze:
ah ok.
I thought it was just for debugging. I'm not gonna change it since
it's gonna involve months of debugging.
Ideally, the extract format should be fixed like the phrase-table,
with the
Because calculating translation probabilities from sentence ids is
unexpectedly beneficial?
On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote:
So, how come this is not damaging the Edinburgh system?
W dniu 23.07.2014 17:32, Hieu Hoang pisze:
ah ok.
I thought it was just for debugging. I'm
it's likely we're using fractional count so there's a extra column
On 23 July 2014 16:34, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
So, how come this is not damaging the Edinburgh system?
W dniu 23.07.2014 17:32, Hieu Hoang pisze:
ah ok.
I thought it was just for debugging. I'm
In a corpus sorted with sentences sorted by release date this could
actually make sense :)
W dniu 23.07.2014 17:40, Barry Haddow pisze:
Because calculating translation probabilities from sentence ids is
unexpectedly beneficial?
On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote:
So, how come
Hi,
this is how extract is called:
extract corpus.en corpus.fr align extract 5 --IncludeSentenceId
this is how score is called:
score extract lex.f2e phrase-table.half --GoodTuring --DomainIndicator
domains.5
phrase table looks fine to me
-phi
On Wed, Jul 23, 2014 at 11:42 AM, Marcin
So, adding --IgnoreSentenceId to score might fix that without
messing up your stuff? I guess I can do that if you can't be bothered,
Hieu.
W dniu 23.07.2014 17:53, Philipp Koehn pisze:
Hi,
this is how extract is called:
extract corpus.en corpus.fr http://corpus.fr align extract 5
Hi Marcin
It appears that there is an --IgnoreSentenceId argument already, added
by Maria during last year's MTM
[gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore
bff12363 (maria nadejde 2013-09-13 12:45:46 +0200 42) if (args[i] ==
--IgnoreSentenceId) {
cheers - Barry
On 23/07/14
i was doing it it, but mine was a more holistic approach but it would have
broken compability.
so i can't be bothered
On 23 July 2014 16:56, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
So, adding --IgnoreSentenceId to score might fix that without
messing up your stuff? I guess I can
Oh. Good! I guess there is a lesson to be learned somewhere.
Thanks.
W dniu 23.07.2014 18:06, Barry Haddow pisze:
Hi Marcin
It appears that there is an --IgnoreSentenceId argument already, added
by Maria during last year's MTM
[gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore
Dear Moses Support list members:
the European Association for Machine Translation (EAMT) has published
the call for candidacies to the 2014 EAMT Best Thesis Award. For
details, please visit the following URL:
http://www.eamt.org/news/news_best_thesis2014.php
Best regards,
Mikel L. Forcada
Dear list members:
the European Association for Machine Translation (EAMT) has published
the 2014 call for proposals and the 2015 call for student internships.
For details, please visit the following URLs:
http://www.eamt.org/news/news_call_for_proposals2014.php
Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
(SSST-8)
EMNLP 2014 / SIGMT / SIGLEX Workshop
Oct 2014, Doha, Qatar
http://www.cse.ust.hk/~dekai/ssst/
*** New submission deadline for papers and abstracts: August 1st, 2014 ***
*** Special theme: Compositional
I'm not an expert on giza++ but a problem is that it creates similar files
that only differ in the case of the file name, eg
file.a3
file.A3
on operating systems that have case insensitive filesystems
(Windows/cygwin, Mac OSX) they cause problems as the files are overwritten.
I personally
18 matches
Mail list logo