Re: [Moses-support] Pruning source syntax rule table

2013-07-24 Thread Hieu Hoang
The tree-to-string model is faster for decoding, compared to hiero model, and much faster compared to string-to-tree. therefore, imo, you don't need to prune because pruning always causes search errors. if you mean filtering by input sentences to get rid of rules you know will never apply (ie.

[Moses-support] HTML tags

2013-07-24 Thread Cyrine NASRI
Hello, I use a training corpus to build my translation system. But i founf in this corpus some HTML tags like for instance : and i apos;m going to start with this one : if momma ain apos;t happy , ain apos;t nobody happy . Should i have to elliminate this? or keep them? Thank you in advance

Re: [Moses-support] Pruning source syntax rule table

2013-07-24 Thread Rico Sennrich
Marcin Junczys-Dowmunt junczys@... writes: Hi list, If I am not mistaken, there is currently no functionality to prune tree-to-string rule tables. Do you think it makes sense to use filter-pt for hierarchical rules instead, for instance by replacing category symbols with the generic

Re: [Moses-support] Pruning source syntax rule table

2013-07-24 Thread Marcin Junczys-Dowmunt
Ah, perfect. I somehow firmly believed this only works for the non-syntactic hierarchical models. Thanks! Best, Marcin W dniu 24.07.2013 15:19, Rico Sennrich pisze: Marcin Junczys-Dowmunt junczys@... writes: Hi list, If I am not mistaken, there is currently no functionality to prune

Re: [Moses-support] Integrating Lucene Phrase Table

2013-07-24 Thread Hieu Hoang
yes, copy PhraseDictionaryDynSuffixArray, or indeed, your own PhraseDictionaryCompact. there's some docs on adding feature functions. http://www.statmt.org/moses/?n=Moses.FeatureFunctions it should be easier than a year ago when you added PhraseDictionaryCompact On 24 July 2013 13:29, Marcin

Re: [Moses-support] Integrating Lucene Phrase Table

2013-07-24 Thread Marcin Junczys-Dowmunt
Thanks, I am mainly interested in reusing most of the score calculation methods in BilingualDynSuffixArray.* and was looking for an easy way to do that. Now that I have inspected the code for a while I see there is no easy way :) W dniu 24.07.2013 15:29, Hieu Hoang pisze: yes, copy

Re: [Moses-support] HTML tags

2013-07-24 Thread Raphael Payen
You shouldnt keep them: the and ; would be tokenized and pollute your sentences. There are tools to convert them, at least a perl module I think, search about html decoding. They are called html entities, not tags. On Wed, Jul 24, 2013 at 2:16 PM, Cyrine NASRI cyrine.na...@gmail.com wrote:

Re: [Moses-support] Integrating Lucene Phrase Table

2013-07-24 Thread Ondrej Bojar
Hi, Marcin, this could be quite useful (although we did not see any improvements in our experiments, see our WMT(12,11?) paper Selecting Data in EN-CS Translation). It can be useful to index a different factor than the factors that one eventually wants to use in translation. So the config

Re: [Moses-support] Integrating Lucene Phrase Table

2013-07-24 Thread Marcin Junczys-Dowmunt
Hi Ondrej, I do not aim at that much at quality improvement, just want to have a very flexible data structure. I am planning a series of experiments where cruel and twisted things will be done to the source language and thought that might be a good way to speed up experimenting without having

[Moses-support] HTML tags and Sentence Alignment

2013-07-24 Thread Heidi Heweidy
Hello, Are there any quick ways to remove HTML tags and align sentences for Arabic English without having to go through about 45000 lines manually? ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Error using Berkeley Word Aligner

2013-07-24 Thread CHATZITHEODOROU Konstantinos
Hi All, I try to do a training using Berkeley Word Aligner but I have errors in TRAINING_run-berkeley.1 or TRAINING_process-berkeley.1 steps becaouse the system cannot find the 'usr/local/share/java/bin/java'. I replaced the '/usr/local/share/java/bin/java' with '/usr/bin/java' or 'java' in

Re: [Moses-support] -drop-unknown does not work

2013-07-24 Thread Hieu Hoang
I think you asked this question before. I check and was pretty sure it works. How exactly are you running Moses? Can you send me your config files and any other info that you think might be useful to debug this issue. On 23 July 2013 07:46, Li Xiang lixiang@gmail.com wrote: At MERT stage,

Re: [Moses-support] Mert using word error rate

2013-07-24 Thread Hieu Hoang
from the comment on line 48 in https://github.com/moses-smt/mosesdecoder/blob/master/mert/ScorerFactory.cpp it looks like it's been implmented already. Other people may know more about it than me On 18 July 2013 16:09, Felipe Sánchez Martínez fsanc...@dlsi.ua.es wrote: Hi all, Before going

Re: [Moses-support] pruning phrase tables that have more than one factor

2013-07-24 Thread Česlav Przywara
Hi Andrew, filter-pt prunes phrase tables with multiple factors just fine. Cheers, Ceslav on 23.7.2013 22:44 Andrew Vine said the following: Hi, I would like to prune some phrase tables following the method described here.. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc19 Could

Re: [Moses-support] -drop-unknown does not work

2013-07-24 Thread Xiang Li
I find the following code in the moses/TranslationOptionCollection.cpp isDigit = s.find_first_of(“0123456789”); if (isDigit == 1) isDigit = 1; else isDigit == 0; But nearly the same code segment appears in the moses/ChartParser.cpp isDigit = s.find_first_of(“0123456789”); if (isDigit ==