Re: [Moses-support] -drop-unknown does not work

2013-07-25 Thread Hieu Hoang
ah, this would be a problem for you. I don't know Latin Mongolian so I don't know how to solve it. If you have any suggestions or code, please let me know. If you can share the data, that would be great. This would let other people find out about this language pair. On 25 July 2013 01:40, Xiang

Re: [Moses-support] -drop-unknown does not work

2013-07-25 Thread Kenneth Heafield
I think he's reasonably asking that -drop-unknown should drop unknown words even if they contain digits. Maybe this means another command-line option. Also, anybody else notice that this code has no effect? if (isDigit == 1) isDigit = 1; else isDigit == 0; On 07/25/13 08:52, Hieu Hoang

Re: [Moses-support] -drop-unknown does not work

2013-07-25 Thread Barry Haddow
Hi I think what the OP wants is to be able to redefine the exceptions to the 'drop unknown' strategy. At the moment they are hardcoded to be 0123456789. This seems quite reasonable, but what would be even better is a way to plug in your own OOV handler, in case you want to add in some custom

Re: [Moses-support] -drop-unknown does not work

2013-07-24 Thread Hieu Hoang
I think you asked this question before. I check and was pretty sure it works. How exactly are you running Moses? Can you send me your config files and any other info that you think might be useful to debug this issue. On 23 July 2013 07:46, Li Xiang lixiang@gmail.com wrote: At MERT stage,

Re: [Moses-support] -drop-unknown does not work

2013-07-24 Thread Xiang Li
I find the following code in the moses/TranslationOptionCollection.cpp isDigit = s.find_first_of(“0123456789”); if (isDigit == 1) isDigit = 1; else isDigit == 0; But nearly the same code segment appears in the moses/ChartParser.cpp isDigit = s.find_first_of(“0123456789”); if (isDigit ==

[Moses-support] -drop-unknown does not work

2013-07-23 Thread Li Xiang
At MERT stage, I open the switch -drop-unknown for decoder moses_chart. But some oov works sill appear in the output translation. I carefully check the source traing data, but I does not find the oov words. The source language is latin mongolian. Its character consists of 0 % _ - additionally.