Re: [Moses-support] Adding Dictionary

2017-07-07 Thread Matthias Huck
hat's working best for your use case. Also, I'm sure that there would be a couple of other ways of harnessing a dictionary in Moses. Cheers, Matthias > > On Fri, Jul 7, 2017 at 4:26 PM, Matthias Huck <mh...@cis.lmu.de> wrote: > > > > > Hi, > > &

Re: [Moses-support] Adding Dictionary

2017-07-07 Thread Matthias Huck
Hi, A simple solution would be to just append your dictionary to the parallel training data. Or create a second phrase table from the dictionary and do phrase table fillup or something similar. Cheers, Matthias On Fri, 2017-07-07 at 15:02 +0530, Sanjanashree Palanivel wrote: > HI all, > >  

Re: [Moses-support] Advanced Topics documentation

2017-07-06 Thread Matthias Huck
Hi, Philipp Koehn's textbook is a nice introduction to SMT: http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521874157 http://www.statmt.org/book/ For advanced topics, it's best to read the primary literature (i.e., research papers published in conference proceedings and scientific

Re: [Moses-support] Adding new aligned phrases to the existing phrase table

2017-04-11 Thread Matthias Huck
Hi, It might be better to do phrase table fill-up. You would add entries from a second phrase table ("background phrase table") to your first phrase table ("foreground phrase table") only if they're not present yet. You end up with a single table without duplicates. Added background phrases can

Re: [Moses-support] Select sentences that maximize BLEU from n-best list

2017-03-28 Thread Matthias Huck
Hi Marcin, If a sentence-level BLEU does the job for you (rather than corpus -level), then check out the `sentence-bleu-nbest` tool in Moses. This tool worked for me a couple of months ago, and I hope that nobody broke it in the meantime. Once you have sentence-level BLEU scores for all the

Re: [Moses-support] Output files of mgiza

2017-02-13 Thread Matthias Huck
Hi, mgiza can be configured to write a Model 1 file to disk. Use the configuration option "model1dumpfrequency". https://web.archive.org/web/20150919195919/http://www.kyloo.net/software/doku.php/mgiza:configure Cheers, Matthias On Mon, 2017-02-13 at 16:50 +, Hieu Hoang wrote: > the slide

Re: [Moses-support] too few factors error in mert

2016-12-06 Thread Matthias Huck
Hi, Maybe your moses.ini lets the decoder expect five input factors, wherea s there are only four present in the data? I see this in your log file: input-factors: 0 1 2 3 4 Cheers, Matthias On Tue, 2016-12-06 at 11:18 +0200, Hasan Sait ARSLAN wrote: > Hi, > > I have a factored

Re: [Moses-support] Placeholder settings for tune

2016-08-23 Thread Matthias Huck
Hi, In the EMS configuration file, you can specify decoder-settings = "..." under both [TUNING] and [EVALUATION]. Maybe that's all you need? Cheers, Matthias On Tue, 2016-08-23 at 00:40 +0100, Hieu Hoang wrote: > no really sure what you mean. Shouldn't have to dig around mert >

Re: [Moses-support] tuning not working properly in factored model

2016-04-28 Thread Matthias Huck
> during > the tuning process that only the forms appear. > > Best regards, > > Carlos > > 2016-04-28 20:14 GMT+02:00 Matthias Huck <mh...@cis.lmu.de>: > > > Hi, > > > > Moses can be configured to output the target-side factors of your > > c

Re: [Moses-support] tuning not working properly in factored model

2016-04-28 Thread Matthias Huck
Hi, Moses can be configured to output the target-side factors of your choice. Add something like this to your moses.ini: [output-factors] 0 1 2 Cheers, Matthias On Thu, 2016-04-28 at 18:16 +0200, Carlos Escolano wrote: > Hi, > > Thank you for your answer. > > You are right. While the

Re: [Moses-support] Compiling with ./bjam problem

2016-03-11 Thread Matthias Huck
Hi Despina, It seems to me that bjam doesn't use the boost build in your home directory, but some other boost version installed on the system. Maybe you should try ./bjam --with-boost=/home/despina/boost_1_55_0 -j4 -a Cheers, Matthias On Fri, 2016-03-11 at 16:29 +, Hieu Hoang

Re: [Moses-support] RNNLM Integration?

2016-03-08 Thread Matthias Huck
Hi, We once empirically compared two different recombination schemes in a hierarchical phrase-based system (without any kind of neural network language model): Recombination T. The T recombination scheme recombines derivations that produce identical translations. (I.e., hypotheses with the same

Re: [Moses-support] Segmentation Fault

2016-02-20 Thread Matthias Huck
Hi Jasneet, Why don't you use a proper profiling tool, e.g. the one in valgrind [1]? If you visualize its output [2], you'll see quickly where the program spends all the computing time. Cheers, Matthias [1] http://valgrind.org/docs/manual/cl-manual.html [2]

Re: [Moses-support] Segmentation Fault

2016-02-15 Thread Matthias Huck
Hi, You can set a local verbosity level for your feature function, e.g.: CoarseBiLM name=CoarseBiLM100 verbosity= If you use the macros FEATUREVERBOSE(level,str), FEATUREVERBOSE2(level,str), or IFFEATUREVERBOSE(level) in your feature function code, the verbose output will only be

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-01-29 Thread Matthias Huck
igured, it should tell you about it. But maybe not with a segmentation fault. :-) > On 29 Jan 2016 9:15 pm, "Matthias Huck" <mh...@inf.ed.ac.uk> wrote: > > Hi, > > > > It seems to me that this toy string-to-tree setup is either > > outdated, > > or it a

Re: [Moses-support] Segmentation fault on hierarchical model with moses in server mode

2016-01-29 Thread Matthias Huck
Hi, It seems to me that this toy string-to-tree setup is either outdated, or it always had issues. It should be replaced. Under real-world conditions, the decoder should always be able to produce some hypothesis. We would therefore usually extract a whole set of glue rules. And we would

Re: [Moses-support] IRSTLM

2016-01-19 Thread Matthias Huck
Hi, I believe that the "~" might be the culprit. Try: ./bjam --with-irstlm=/home/mty2015/Public/MTEngine/Moseshome/mosesdecoder/irstlm (If this is the correct absolute path to your IRSTLM installation.) Cheers, Matthias On Wed, 2016-01-20 at 00:32 +, Hieu Hoang wrote: > it's

Re: [Moses-support] BLEU score becomes different

2016-01-18 Thread Matthias Huck
Hi Liang, mteval-v13a.pl does some internal tokenization and probably splits those "~~" words into " ~ ~ ". If this is happening, it explains your difference in the calculated BLEU scores. Cheers, Matthias On Mon, 2016-01-18 at 17:01 +0800, 姚亮 wrote: > Dear Moses Support Team, > >I

Re: [Moses-support] IRSTLM installation

2016-01-18 Thread Matthias Huck
Hi, Have you tried to use an absolute path? Cheers, Matthias On Mon, 2016-01-18 at 02:52 +0100, Ouafa Benterki wrote: > Hello, > > I installed IRSTLM but when i used the command > ./bjam --with-irstlm=/path to irstlm/ the installation failed > can you advise > > Best -- The University of

Re: [Moses-support] Tuning with no language model

2016-01-13 Thread Matthias Huck
Hi, If you don't need all score components of a phrase table, the easiest way to get rid of them is to set the scaling factors for the undesired phrase table feature function components to 0 before tuning, and ask the optimizer to ignore them. The feature function configuration parameter

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
So, what has been the proper solution? On Fri, 2016-01-08 at 13:20 -0500, Nicholas Ruiz wrote: > Thanks everyone, it's working now. > > zınɹ ʞɔıu -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
Hi Nick, What you're attempting to do should generally be no problem. There's most likely some issue with your EMS configuration file. Doesn't it tell you something like: BUGGY CONFIG LINE (474): wrapping-frame = $tokenized-input I get this when I put two spaces between

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
Hmm, maybe it can also cause trouble with the reuse of parts from previous steps if the user doesn't proceed with care. You could overwrite steps/1/config.1 on a call of experiment.perl -config config.1 -continue 1 -exec . On Fri, 2016-01-08 at 20:56 +, Matthias Huck wrote: > Hi Phil

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Matthias Huck
d behaviour > down the road... > > > -phi > > On Fri, Jan 8, 2016 at 1:24 PM, Matthias Huck <mh...@inf.ed.ac.uk> > wrote: > Hi Philipp, > > On Fri, 2016-01-08 at 13:17 -0500, Philipp Koehn wrote: > > the comma

Re: [Moses-support] Chinese & Arabic Tokenizers

2015-12-18 Thread Matthias Huck
Hi Tom, There used to be a freely available Chinese word segmenter provided by the LDC as well. Unfortunately, things keep disappearing from the web. https://web.archive.org/web/20130907032401/http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm For Arabic, I think that many academic research groups

Re: [Moses-support] Lexical reordering fails with zlib

2015-12-17 Thread Matthias Huck
Hi, It's a problem that apparently occurs very rarely, and as Guy mentioned, we were so far assuming that it's caused by a zlib bug. However, the zlib bug was (to my knowledge) fixed in zlib v1.2.8. This seems to be the bug fix:

Re: [Moses-support] Lexical reordering fails with zlib

2015-12-17 Thread Matthias Huck
Hi, As an addendum: You can try a manual workaround. Run gunzip on extract.o.sorted.gz and do lexical-reordering-score on the resulting plain text file. It might be inconvenient but would hopefully solve the issue. Cheers, Matthias On Thu, 2015-12-17 at 17:44 +, Matthias Huck wrote

Re: [Moses-support] Slides or paper walking through SearchNormal::ProcessOneHypothesis ?

2015-12-15 Thread Matthias Huck
Hi Lane, Well, you can find excellent descriptions of phrase-based decoding algorithms in the literature, though possibly not all details of this specific implementation. I like this description: R. Zens, and H. Ney. Improvements in Dynamic Programming Beam Search for Phrase-based Statistical

Re: [Moses-support] Do debugging in the decoder?

2015-10-05 Thread Matthias Huck
t; using this option, which has now been fixed > > https://github.com/moses-smt/mosesdecoder/commit/72bef00781de9821f2cff227ca7417939041d4e1 > > > On 04/10/2015 23:25, Matthias Huck wrote

Re: [Moses-support] Do debugging in the decoder?

2015-10-04 Thread Matthias Huck
Hi Yuqi, You can build a debug compile by calling bjam with: --variant=debug Cheers, Matthias On Sun, 2015-10-04 at 23:05 +0200, Yuqi Zhang wrote: > Hello, > > > How can I debug the decoder? > > > Must I turn off the pre-compile signal "WITH_THREADS"? > Can it be turned off?

Re: [Moses-support] Regarding Parallel Corpus Repository

2015-09-27 Thread Matthias Huck
Hi, The Hindi-English language pair was part of the WMT shared translation task in 2014. See the following website for download links of training data and dev/test sets: http://www.statmt.org/wmt14/translation-task.html Cheers, Matthias On Sun, 2015-09-27 at 20:15 +0530, nakul sharma wrote: >

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi Vincent, Pruning the phrase table will discard many bad entries. The decoder is typically configured to load no more than a maximum number of translation options per distinct source side. Use table-limit=20 as a parameter to your translation model feature to limit the amount of candidates to

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
ink it works. The decoder does this, not the phrase table binarizer. You could run a simple experiments in order to verify. Add -feature-overwrite 'TranslationModel0 table-limit=20' (or equivalent) to your decoder call. Cheers, Matthias > Le 24/09/2015 15:21, Matthias Huck a écrit : > > Hi Vi

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi Vincent, This is a different topic, and I'm not completely clear about what exactly you did here. Did you decode the source side of the parallel training data, conduct sentence selection by applying a threshold on the decoder score, and extract a new phrase table from the selected fraction of

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Matthias Huck
Hi Vincent, On Thu, 2015-09-24 at 22:37 +0200, Vincent Nguyen wrote: > Thanks Matthias for the detailed explanation. > I think I have most of it in mind except not really understanding how > this one works : > > "Difficult sentences generally have worse model score than easy ones but > may

Re: [Moses-support] How to Develop Parallel Corpus

2015-09-06 Thread Matthias Huck
Hi Asad, You can try Hunalign or the Microsoft Bilingual Sentence Aligner (if it's for non-commercial purposes). Cheers, Matthias On Sun, 2015-09-06 at 10:24 +, Asad A.Malik wrote: > Hi All, > > > I am currently trying to develop the parallel corpus. I wanted to know > is there any tool

Re: [Moses-support] Generating Segment Level BLEU, NIST and METEOR scores

2015-09-02 Thread Matthias Huck
Hi Liling, This tool calculates sentence-level BLEU scores (smoothed via incrementing the n-gram counts by 1): bin/sentence-bleu Make sure that you provide the hypothesis and reference files in an appropriately processed way. The tool doesn't apply any tokenization or remove any markup

Re: [Moses-support] Domain adaptation

2015-08-14 Thread Matthias Huck
Hi, I found this older tutorial to be very useful as well: Practical Domain Adaptation by Marcello Federico and Nicola Bertoldi http://www.mt-archive.info/10/AMTA-2012-Bertoldi-ppt.pdf (The document formatting is unfortunately slightly messed up.) SMT research survey wiki:

Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-17 Thread Matthias Huck
On Fri, 2015-07-17 at 09:08 +0400, Hieu Hoang wrote: the OnDisk pt can do everything - sparse features, properties, hiero models. it's just slow and big i think the old Binary pt did sparse features but not properties, the Compact pt does neither Ah, I guess that explains why it didn't

Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-16 Thread Matthias Huck
contains sparse features, then this needs to be flagged in the configuration file by adding the word sparse after the phrase table file name.. Did i miss anything? Regards, Jian On Thu, Jul 16, 2015 at 3:23 AM, Matthias Huck mh...@inf.ed.ac.uk wrote: Hi Jian

Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-15 Thread Matthias Huck
functions, I'd like to know are there any difference between these two options, for example, tuning, compute sentence translation scores ... Regards, Jian On Thu, Jul 16, 2015 at 2:06 AM, Matthias Huck mh...@inf.ed.ac.uk wrote: Hi, Are you planning

Re: [Moses-support] Sparse phrase table, is still supported?

2015-07-15 Thread Matthias Huck
Hi, Are you planning to use binary domain indicator features? I'm not sure whether a sparse feature function for this is currently implemented. If you're working with a small set of domains, you can employ dense indicators instead (domain-features = indicator in EMS). You'll have to re-extract

Re: [Moses-support] multiple interpolated LM

2015-06-28 Thread Matthias Huck
Hi Hieu, That should be no problem. Pretty sure I did that a couple of times already. No need to add another [INTERPOLATED-LM] section. Just try! Cheers, Matthias On Sun, 2015-06-28 at 10:55 +0400, Hieu Hoang wrote: in the EMS, is it possible to create interpolated LM for different factors?

Re: [Moses-support] Major bug found in Moses

2015-06-24 Thread Matthias Huck
Hi James, Irrespective of the fact that you need to tune the weights of the log-linear model: Let me provide more references in order to shed light on how well established simple pruning techniques are in our field as well as in related fields (namely, automatic speech recognition). This list

Re: [Moses-support] please help me with the code - getting word index

2015-06-20 Thread Matthias Huck
like to know which terminals (non terminals) are corresponded to which source word's index in the source. Could you guide me how to obtain that? Thanks again On Thu, Jun 18, 2015 at 9:48 PM, Matthias Huck mh...@inf.ed.ac.uk wrote: Hi, You can calculate

Re: [Moses-support] Major bug found in Moses

2015-06-19 Thread Matthias Huck
From: Matthias Huck mh...@inf.ed.ac.uk Sent: Friday, June 19, 2015 5:08 PM To: Read, James C Cc: Hieu Hoang; moses-support@mit.edu; Arnold, Doug Subject: Re: [Moses-support] Major bug found in Moses Hi James, Yes, he just said that. The decoder's job is to find

Re: [Moses-support] Dependencies in EMS/Experiment.perl

2015-06-19 Thread Matthias Huck
Hi Evgeny, If setting TRAINING:config won't help, then it might get a bit tricky. Another thing you can try is setting filtered-config or filtered-dir in the [TUNING] section. The next workaround I can think of is pointing to existing files in all the [CORPUS:*] sections by setting

Re: [Moses-support] When to truecase

2015-05-22 Thread Matthias Huck
Hi, If your system output is lowercase, you could try SRILM's `disambig` tool for predicting the correct casing in a postprocessing step. http://www.speech.sri.com/projects/srilm/manpages/disambig.1.html Cheers, Matthias On Fri, 2015-05-22 at 11:20 +0200, Ondrej Bojar wrote: Hi, we also

Re: [Moses-support] How can I change LM binarization in EMS without re-tuning?

2015-05-20 Thread Matthias Huck
Oh, are there two ways of doing this? I use config-with-reused-weights rather than weight-config. On Wed, 2015-05-20 at 15:11 -0400, Philipp Koehn wrote: Hi, you can point to the previous configuration file with the old weights: [TUNING] ### instead of tuning with this setting, old

Re: [Moses-support] Question About matrix.stamt.org WMT 2014 Test Set

2015-04-27 Thread Matthias Huck
Hi Graham, Did you have a look at the tarballs that were distributed last year? http://www.statmt.org/wmt14/translation-task.html There are three different version: - Test sets (5.2 MB) These are the source sgm files with extra filler sentences. They were the actual files released for the

[Moses-support] mert-moses.pl -continue

2015-04-27 Thread Matthias Huck
Hi, Is there possibly a problem when continuing interrupted tuning runs with sparse features? It seems to me that mert-moses.pl doesn't add the [weight-file] section to the run*.moses.ini it creates right after resuming the tuning. That would imply that no sparse weights are used in the next

Re: [Moses-support] [decoding-graph-backoff]

2015-04-19 Thread Matthias Huck
2015 at 01:26, Matthias Huck mh...@inf.ed.ac.uk wrote: I think your remark in the mail from January was correct, it has to be ePos-sPos+1 backoff but currently still is ePos-sPos+1 = backoff Are you able to somehow

Re: [Moses-support] Segfaulting with WordTranslationFeature

2015-04-17 Thread Matthias Huck
Hi Lexi, The feature most likely won't be particularly important. But this might be a completely different issue than you think. You should debug this. Can you print the phrase pair that is applied when the error occurs? I recently came across a segfault that seemed to be caused by the OSM

Re: [Moses-support] [decoding-graph-backoff]

2015-04-16 Thread Matthias Huck
Hi Hieu, It seems that [decoding-graph-backoff] doesn't quite behave like last year any more. Can you briefly explain how its behaviour has changed, i.e. what it did before and what it does now? Can you please also let me know whether there's a way to reproduce the old behaviour via configuration

Re: [Moses-support] [decoding-graph-backoff]

2015-04-16 Thread Matthias Huck
. On Thu, 2015-04-16 at 21:34 +0400, Hieu Hoang wrote: Didn't know it has changed. How should it behave and how does it actually behave? On 16 Apr 2015 21:04, Matthias Huck mh...@inf.ed.ac.uk wrote: Hi Hieu, It seems that [decoding-graph-backoff] doesn't quite behave

Re: [Moses-support] swap glue rule

2015-04-14 Thread Matthias Huck
: Matthias Huck, Joern Wuebker, Felix Rietig, and Hermann Ney. A Phrase Orientation Model for Hierarchical Machine Translation. In ACL 2013 Eighth Workshop on Statistical Machine Translation (WMT 2013), pages 452-463, Sofia, Bulgaria, August 2013. I don't know if the usage of the feature

[Moses-support] n-best list reranking

2015-03-27 Thread Matthias Huck
Hi, I'm looking for a tool to rerank n-best lists in Moses' current format, including sparse features. The CSLM toolkit has quite a nice re-ranker implementation, but apparently it doesn't know sparse features yet. If anyone already has an extended version of the existing re-ranker from the CSLM

Re: [Moses-support] n-best list reranking

2015-03-27 Thread Matthias Huck
and Moses on-disk phrase tables (and obviously neural networks). Why not adding more functionality ... - Holger On 03/27/2015 11:42 PM, Matthias Huck wrote: Hi, I'm looking for a tool to rerank n-best lists in Moses' current format, including sparse features. The CSLM toolkit has

Re: [Moses-support] Moses segmentation fault under multi-thread

2015-03-11 Thread Matthias Huck
Hi, I've recently been using these sparse feature functions without any issues in multi-threaded chart-based decoding. There might be a problem with thread safety, but I currently can't tell why you got the segmentation fault. You should investigate this in more detail. Cheers, Matthias On

Re: [Moses-support] Can I buy ready to use phrase-table

2015-03-03 Thread Matthias Huck
Hi, Some pre-trained models for Moses Release 3.0 have been made publicly available anyway: http://www.statmt.org/moses/RELEASE-3.0/models/ http://www.statmt.org/moses/?n=moses.releases http://www.statmt.org/mosescore/uploads/Internal/D1.4_Moses_v3_Release_Notes.pdf I can't tell whether you're

Re: [Moses-support] Documentation describing Moses n-best list extraction

2015-02-28 Thread Matthias Huck
On Sat, 2015-02-28 at 16:45 +, Hieu Hoang wrote: i've never seen the phrase-based n-best extraction explicitly described. There was a paper on directed graph enumeration (I forget which) that was helpful t me when I was implementing it. Maybe this? Hart, P., Nilsson, N., and Raphael,

Re: [Moses-support] Documentation describing Moses n-best list extraction

2015-02-28 Thread Matthias Huck
On Sat, 2015-02-28 at 17:11 +, Matthias Huck wrote: On Sat, 2015-02-28 at 16:45 +, Hieu Hoang wrote: i've never seen the phrase-based n-best extraction explicitly described. There was a paper on directed graph enumeration (I forget which) that was helpful t me when I

Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck
somewhere between 2.1 and 3.0, the keyword 'distinct' was Oops, that was me. And it wasn't intended. I'm using this for my own setups and apparently copied it to master when I added some other stuff. Hope I didn't mess up other people's experiments. It's been in master since 7 August 2014

Re: [Moses-support] Single score in phrase table

2015-02-24 Thread Matthias Huck
Set a higher weight for UnknownWordPenalty? Maybe the default is not adequate if you do strange things like this. On Tue, 2015-02-24 at 23:49 +0100, Marcin Junczys-Dowmunt wrote: Hi, I have a problem with a single score phrase table. All scores have been combined into one score as a linear

Re: [Moses-support] Tuning with mert-moses.perl error

2015-02-17 Thread Matthias Huck
Hi Hayo, Can you please do two things: 1.) Send me the file filtered/moses.ini so that I can have a look at the feature functions and scaling factors in there. 2.) Tell me the Git commit ID of the Moses version you're working with. A bug was put into master with commit 70e8eb5. It's been fixed

Re: [Moses-support] Untuneable feature score components?

2015-02-16 Thread Matthias Huck
-23 at 18:37 +, Matthias Huck wrote: On Fri, 2015-01-23 at 18:18 +, Hieu Hoang wrote: True, but that complicates the framework, and doesn't deal with sparse features. Why does it complicate the framework? Isn't the trick about tuneable mostly that you don't write those scores

Re: [Moses-support] I can't get any output from my syntactic baseline.

2015-01-24 Thread Matthias Huck
Hi, As Rico pointed out before: the glue rules are missing. Cheers, Matthias On Sun, 2015-01-25 at 03:25 +0800, hxshi wrote: I can't get any output with my syntactic baseline. Will anybody know what maybe wrong? I trained a string2tree baseline. Got a rule-table such like this: %

Re: [Moses-support] Untuneable feature score components?

2015-01-23 Thread Matthias Huck
, the write another (tuneable) ff which grabs whatever scores it wants from the pt On 23 January 2015 16:55:56 GMT+00:00, Matthias Huck mh...@inf.ed.ac.uk wrote: Hi, Is there any existing functionality to set only specific score components of a feature function

[Moses-support] Untuneable feature score components?

2015-01-23 Thread Matthias Huck
Hi, Is there any existing functionality to set only specific score components of a feature function as untuneable? Feature functions have a boolean tuneable parameter, but it affects all the scores produced by it. It doesn't help in case I want to switch off individual scores from a phrase

Re: [Moses-support] Untuneable feature score components?

2015-01-23 Thread Matthias Huck
for. And maybe somebody on the mailing list has implemented this and never put it into master? I want it for MIRA, btw. I think it should be added if it doesn't exist somewhere yet. Unless someone has strong objections. On 23 January 2015 18:09:11 GMT+00:00, Matthias Huck mh...@inf.ed.ac.uk wrote

Re: [Moses-support] about the scores in run*.best100.out

2015-01-23 Thread Matthias Huck
PlusEquals() before already, then you don't have to modify anything. Cheers, Matthias Regards On Friday, January 23, 2015 12:11 AM, Matthias Huck mh...@inf.ed.ac.uk wrote: Hi Arefeh, Can you try to run the setup from the cluster on your local desktop system? With the same input

Re: [Moses-support] about the scores in run*.best100.out

2015-01-22 Thread Matthias Huck
Hi Arefeh, Can you try to run the setup from the cluster on your local desktop system? With the same input, a Moses binary compiled from the same sources, and the same command to produce the n-best lists? Normally it should give you the same output. Why would the feature never produce an overall

Re: [Moses-support] Configuration parameter documentaion

2015-01-22 Thread Matthias Huck
Hi Roee, I would be very surprised if each and every Moses feature is described somewhere. But Moses is generally very well documented, and you find all the information you need for building a state-of-the-art baseline system on the website [http://www.statmt.org/moses/] and in the manual

Re: [Moses-support] phrase table

2015-01-15 Thread Matthias Huck
Hi, The data is sentence-segmented. Assume you train your model with a training corpus which contains a single parallel sentence pair. Your training sentence has length L on both source and target side, and it's aligned along the diagonal. If n L, you cannot extract any phrase of length n

Re: [Moses-support] Sparse features and overfitting

2015-01-15 Thread Matthias Huck
We typically try to increase the tuning set in order to obtain more reliable sparse feature weights. But in your case it's rather the test set that seems a bit small for trusting the BLEU scores. Do the sparse features give you any large improvement on the tuning set? On Thu, 2015-01-15 at

Re: [Moses-support] Sparse features and overfitting

2015-01-15 Thread Matthias Huck
On Thu, 2015-01-15 at 13:54 +0800, HOANG Cong Duy Vu wrote: - tune test (based on source) size of overlap set = 624 (based on target) size of overlap set = 386 (tune test have high overlapping parts based on source sentences, but half of them have different target sentences) Does

[Moses-support] Feature score deltas in the chart decoder

2015-01-07 Thread Matthias Huck
Hi, I've just pushed a commit to Moses that brings about a slight change wrt. the way the chart decoder deals with feature scores. The chart decoder now stores deltas of individual feature scores instead of constantly summing everything up. This behaviour is similar to what we have been doing

Re: [Moses-support] Alignment symmetrization without giza2bal

2014-12-14 Thread Matthias Huck
Hi Marcin, I don't quite understand why this is a problem. But if you're looking for alternative implementations for word alignment symmetrization: The Jane toolkit includes a program called `mergeAlignment`. It should be able to read Moses format alignments.

Re: [Moses-support] string of Words + states in feature functions

2014-12-10 Thread Matthias Huck
Hi Amir, The input is passed to the feature functions via InitializeForInput(InputType const source). This method is called before search and collecting of translation options (cf. moses/FF/FeatureFunction.h). You can set a member variable to have access to the input in your scoring method.

Re: [Moses-support] How to Run experiment.perl

2014-11-29 Thread Matthias Huck
$ your-moses-directory/scripts/ems/experiment.perl -config config.toy -exec On Sat, 2014-11-29 at 15:00 +, Asad A.Malik wrote: Hi All, How can I Run experiment.perl -config config.toy -exec. When I type following command: $ run experiment.perl -config

Re: [Moses-support] sentence is always too short for cleaning

2014-11-28 Thread Matthias Huck
Hi, If this happens in scripts/training/clean-corpus-n.perl then you should check whether a parallel corpus with the same number of lines on source and target side is passed to that script. Maybe there's an issue with your training data or something went wrong in a previous step of the

Re: [Moses-support] WG: Unknown single words that are part of phrases

2014-11-27 Thread Matthias Huck
- Von: Vera Aleksic, Linguatec GmbH Gesendet: Donnerstag, 27. November 2014 09:42 An: 'Matthias Huck'; Raj Dabre Betreff: AW: [Moses-support] Unknown single words that are part of phrases Hi, Thank you for your answers. @Raj, one-word-translations do not exist, I have searched for them

Re: [Moses-support] WG: Unknown single words that are part of phrases

2014-11-27 Thread Matthias Huck
, Matthias Huck wrote: Hi Vera, It's odd that the lexical translation model contains such an entry if the pair is always unaligned. Maybe you used a different word alignment when you extracted the lexicon model? You should manually have a look at your word alignment in order to check

Re: [Moses-support] Unknown single words that are part of phrases

2014-11-26 Thread Matthias Huck
Hi, Supposedly your phrase table does not contain an entry Gitarre ||| guitar because this word pair is always unaligned in your training data. You could try to improve your word alignment quality. Alternatively, you could implement a procedure in the manner of the forced single word heuristic

Re: [Moses-support] Sentence mismatch error!

2014-10-09 Thread Matthias Huck
Hi Arefeh, Have you been able to resolve that issue? Maybe one of your GIZA alignments is flawed, for instance because the GIZA process was terminated before is finished. Did you check that both the standard and the inverse alignment files have the same number of lines? Check it like this: $

Re: [Moses-support] n-best-list diversity

2014-06-05 Thread Matthias Huck
distinct n-best lists with at most 100 items and they seem to be a little bit better filled than with -sd 0. With cube pruning, -cbd some_number does not seem to do anything, I also tried to increase the pop limit with no success. Best, Marcin W dniu 05.06.2014 19:17, Matthias Huck pisze

Re: [Moses-support] Problems with segmentation mismatch and many unknown words for Chinese translation

2014-05-29 Thread Matthias Huck
Hi Gideon, I still tend to believe that there's some issue with your preprocessing. Or maybe there's a mismatch in the way you preprocessed your training and test data? The OOV rates on MT06 and MT08 are very low in the systems built by us at RWTH (cf. the numbers I sent you as a reply to your

Re: [Moses-support] Configuring LMs

2014-05-28 Thread Matthias Huck
:08, Matthias Huck wrote: Hi Lars, The instructions you're looking for are here: http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel You can also create a KenLM binary file instead and use it in the decoder with the KENLM line in the [feature] section of your

Re: [Moses-support] Decode error in EMS

2014-05-27 Thread Matthias Huck
Hi Mauro, The weights for LM1 and LM2 are missing in your config file. You need to add them in the [weight] section. # core weights [weight] Distortion0= 0.3 UnknownWordPenalty0= 1 WordPenalty0= -1 TranslationModel0= 0.2 0.2 0.2 0.2 PhrasePenalty0= 0.2 LexicalReordering0= 0.3 0.3 0.3 0.3 0.3

Re: [Moses-support] Configuring LMs

2014-05-27 Thread Matthias Huck
Hi Lars, The instructions you're looking for are here: http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel You can also create a KenLM binary file instead and use it in the decoder with the KENLM line in the [feature] section of your moses.ini. $ kenlm/build_binary

Re: [Moses-support] one-to-one alignment

2014-02-15 Thread Matthias Huck
Hi Arefeh, You could intersect the standard and inverse alignments from GIZA instead of applying the grow-diag-final-and heuristic. This will typically impair translation quality, though. Cheers, Matthias On Sat, 2014-02-15 at 14:18 -0800, Arefeh Kazemi wrote: Hello all, I am using Moses

Re: [Moses-support] Get plain text from the output of a translation

2014-02-14 Thread Matthias Huck
Hi Per, The standard workflow is to run a postprocessing step on the output, e.g. with scripts/tokenizer/detokenizer.perl in Moses. Usage ./detokenizer.perl (-l [en|fr|it|cs|...]) tokenizedfile detokenizedfile Options: -u ... uppercase the first char in the final sentence. -q ...

Re: [Moses-support] word alignment viewer

2013-12-09 Thread Matthias Huck
It's called Cairo: Cairo: An Alignment Visualization Tool. Noah A. Smith and Michael E. Jahr. In Proceedings of the Language Resources and Evaluation Conference (LREC 2000), pages 549–552, Athens, Greece, May/June 2000. http://www.cs.cmu.edu/~nasmith/papers/smith+jahr.lrec00.pdf

Re: [Moses-support] 10 years of OPUS

2013-11-06 Thread Matthias Huck
Hi James, There has been a vast literature on adaptation techniques for SMT in recent years. Some reading suggestions: http://www.statmt.org/wmt07/pdf/WMT17.pdf http://www.statmt.org/wmt09/pdf/WMT-0932.pdf http://dl.acm.org/citation.cfm?id=1870702

Re: [Moses-support] gappy phrases

2013-11-05 Thread Matthias Huck
Hi James, the Phrasal toolkit is freely available as well [http://nlp.stanford.edu/phrasal/], so why don't you consider extracting discontinuous phrases using Stanford's original implementation? Cheers, Matthias On Tue, 2013-11-05 at 07:29 +, Read, James C wrote: Interesting. This

Re: [Moses-support] gappy phrases

2013-11-05 Thread Matthias Huck
...@mit.edu [moses-support-boun...@mit.edu] on behalf of Matthias Huck [mh...@inf.ed.ac.uk] Sent: 05 November 2013 13:15 To: moses-support@mit.edu Subject: Re: [Moses-support] gappy phrases Hi James, the Phrasal toolkit is freely available as well [http://nlp.stanford.edu/phrasal/], so

Re: [Moses-support] gappy phrases

2013-11-05 Thread Matthias Huck
Hi James, I tried Phrasal Beta2 and Beta3 a couple of months ago. Both worked for me with some minor hassle. You should follow the instructions from http://www-nlp.stanford.edu/wiki/Software/Phrasal#Phrasal in order to set up your environment. I'm also sure that the Phrasal developers are able

Re: [Moses-support] gappy phrases

2013-11-04 Thread Matthias Huck
Hi, RWTH Aachen University implemented extraction of discontinuous phrases and decoding with source-side gaps in the Jane toolkit [www.hltpr.rwth-aachen.de/jane/]. We did not see any clear improvements over standard phrase-based setups in our experiments, though. Some results were published in