Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
Oh. Good! I guess there is a lesson to be learned somewhere. Thanks. W dniu 23.07.2014 18:06, Barry Haddow pisze: > Hi Marcin > > It appears that there is an --IgnoreSentenceId argument already, added > by Maria during last year's MTM > >> [gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore >>

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
i was doing it it, but mine was a more holistic approach but it would have broken compability. so i can't be bothered On 23 July 2014 16:56, Marcin Junczys-Dowmunt wrote: > So, adding "--IgnoreSentenceId" to "score" might fix that without > messing up your stuff? I guess I can do that if you

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Barry Haddow
Hi Marcin It appears that there is an --IgnoreSentenceId argument already, added by Maria during last year's MTM > [gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore > bff12363 (maria nadejde 2013-09-13 12:45:46 +0200 42) if (args[i] == > "--IgnoreSentenceId") { cheers - Barry On 23/07/1

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
So, adding "--IgnoreSentenceId" to "score" might fix that without messing up your stuff? I guess I can do that if you can't be bothered, Hieu. W dniu 23.07.2014 17:53, Philipp Koehn pisze: Hi, this is how extract is called: extract corpus.en corpus.fr align extract 5 --In

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Philipp Koehn
Hi, this is how extract is called: extract corpus.en corpus.fr align extract 5 --IncludeSentenceId this is how score is called: score extract lex.f2e phrase-table.half --GoodTuring --DomainIndicator domains.5 phrase table looks fine to me -phi On Wed, Jul 23, 2014 at 11:42 AM, Marcin Junczys

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
In a corpus sorted with sentences sorted by release date this could actually make sense :) W dniu 23.07.2014 17:40, Barry Haddow pisze: > Because calculating translation probabilities from sentence ids is > unexpectedly beneficial? > > On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote: >> >> So, h

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
it's likely we're using fractional count so there's a extra column On 23 July 2014 16:34, Marcin Junczys-Dowmunt wrote: > > So, how come this is not damaging the Edinburgh system? > > W dniu 23.07.2014 17:32, Hieu Hoang pisze: > > ah ok. > > I thought it was just for debugging. I'm not gonna ch

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Barry Haddow
Because calculating translation probabilities from sentence ids is unexpectedly beneficial? On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote: > > So, how come this is not damaging the Edinburgh system? > > W dniu 23.07.2014 17:32, Hieu Hoang pisze: >> ah ok. >> >> I thought it was just for debuggi

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
So, how come this is not damaging the Edinburgh system? W dniu 23.07.2014 17:32, Hieu Hoang pisze: ah ok. I thought it was just for debugging. I'm not gonna change it since it's gonna involve months of debugging. Ideally, the extract format should be fixed like the phrase-table, with the l

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
ah ok. I thought it was just for debugging. I'm not gonna change it since it's gonna involve months of debugging. Ideally, the extract format should be fixed like the phrase-table, with the last column being key-value pairs. Also, way the key-value pairs are processed should be automatic lik

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Philipp Koehn
Hi, the sentence ID is being used for the domain indicator features. If you run phrase-extract's score with specifying a domain file, it then it uses the sentence IDs to find out which domain the phrase pair was found in. This is a standard features in Edinburgh's phrase-based system for the las

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
Key-value format would actually be fine. W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze: I was planning to use it for a custom feature function later. W dniu 23.07.2014 13:11, Hieu Hoang pisze: i can change it so that the sentence id is put into a key-value field in the last column. w

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
I was planning to use it for a custom feature function later. W dniu 23.07.2014 13:11, Hieu Hoang pisze: i can change it so that the sentence id is put into a key-value field in the last column. what is the sentence id used for? is it just for debugging purposes? On 23 July 2014 11:36, Marci

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
i can change it so that the sentence id is put into a key-value field in the last column. what is the sentence id used for? is it just for debugging purposes? On 23 July 2014 11:36, Marcin Junczys-Dowmunt wrote: > Hi, > I am using train-model.perl with > > --extract-options="--IncludeSentenc

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Barry Haddow
Hi Marcin There's a facility to include a weight in the extract file, which is then used in phrase scoring. Somehow this appears to have got mixed up with the sentence id. The problem of not having meta data. cheers - Barry On 23/07/14 11:36, Marcin Junczys-Dowmunt wrote: > Hi, > I am using tr

[Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
Hi, I am using train-model.perl with --extract-options="--IncludeSentenceId" and it seems that the sentence id is somehow getting into the phrase table as a count and later used for phrase translation weight calculation, for instance the extract (last column is the Id): #c the compound or pr