from:"Vincent Nguyen"

Re: [Moses-support] NCv12 number of lines mismatch

2018-04-23 Thread Vincent Nguyen

tr -d -c '\r' < news-commentary-v12.de-en.en | wc -c 4099 so v12 is broken somehow when reading it with some tools / primitive, but it works with some others. Just to let you know. Le 14/09/2017 à 08:48, Vincent Nguyen a écrit : > okay really weird. > wc gives me the same number

[Moses-support] OpenNMT workshop March 2 2018

2018-02-01 Thread Vincent Nguyen

Dear all, In case one would like a good excuse to visit Paris March 2-3 2018, there will be a workshop on OpenNMT. Here is the registration website. http://workshop-paris-2018.opennmt.net/ Cheers, Vincent ___ Moses-support mailing list Moses-suppo

Re: [Moses-support] NCv12 number of lines mismatch

2017-09-14 Thread Vincent Nguyen

nano give also the "right" number 270769 but I got some script which find a difference. Le 14/09/2017 à 08:48, Vincent Nguyen a écrit : > okay really weird. > wc gives me the same numbers as you, but gedit give another 2 different > numbers for each file. Must be special c

Re: [Moses-support] NCv12 number of lines mismatch

2017-09-13 Thread Vincent Nguyen

* >> 270769 news-commentary-v12.de-en.de >> 270769 news-commentary-v12.de-en.en >> 541538 total > > What are you running that shows you different line numbers? > > cheers - Barry > > On 12/09/17 10:06, Vincent Nguyen wrote: >> Hi, >> Is there an

[Moses-support] NCv12 number of lines mismatch

2017-09-12 Thread Vincent Nguyen

Hi, Is there an updated version of NCv12 for this http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz the number of lines for de-en is not the same in the 2 languages. Cheers, Vincent ___ Moses-support mailing list Moses-support@

[Moses-support] Chinese tokenizer / detokenizer (segmenter / unsegmenter)

2017-05-29 Thread Vincent Nguyen

Hello team, I have read many post and it looks like most people tend to use the Stanford segmenter. Do you have some good experience with other tools ? Also, what "detokenizer" do you actually use. It seems, that it is not just a question of removing space, especially when Chinese target cont

Re: [Moses-support] Looking for a tool for training csv delimited and aligned data

2017-04-26 Thread Vincent Nguyen

I think you mixed up input/ouput because in your example at the end, you would like to get pronunciation of a given new word. input is the left hand side and output is the pron. If you are able to rework a little bit the right hand side of your data (you need to stretch the phones one by one,

Re: [Moses-support] Training backward LM?

2016-10-20 Thread Vincent Nguyen

Hi Michael, Trying to check if you're tests on this subject were successful or not, can you follow up ? thanks ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] News monolingual corpus question

2016-10-05 Thread Vincent Nguyen

re de-duping, and before we > didn't. > > I would say if you want to compare to recent WMT experiments, take the > most recent version of the data, > > cheers - Barry > > On 04/10/16 21:34, Vincent Nguyen wrote: >> >> ok >> this one http://www.statmt.o

Re: [Moses-support] News monolingual corpus question

2016-10-04 Thread Vincent Nguyen

sed files? > > cheers - Barry > > On 04/10/16 14:40, Vincent Nguyen wrote: >> Hi, >> >> on this link: >> >> http://www.statmt.org/wmt11/translation-task.html >> >> on the download section for monolingual data, there is : >> >> on

[Moses-support] News monolingual corpus question

2016-10-04 Thread Vincent Nguyen

Hi, on this link: http://www.statmt.org/wmt11/translation-task.html on the download section for monolingual data, there is : one big file : http://www.statmt.org/wmt11/training-monolingual.tgz And separate files, of which news crawls per year. However, when you take a single file for a specif

Re: [Moses-support] EMS question - no recasing no truecasing

2016-05-30 Thread Vincent Nguyen

2016 at 9:57 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi, I have a basic question on EMS. If I want no recasing and no truecasing, I just put IGNORE next to the 2 sections. However I have the feeling it does not eliminate this step for the EVALUAT

[Moses-support] EMS question - no recasing no truecasing

2016-05-30 Thread Vincent Nguyen

Hi, I have a basic question on EMS. If I want no recasing and no truecasing, I just put IGNORE next to the 2 sections. However I have the feeling it does not eliminate this step for the EVALUATION step, and there is no ignore within this one. Is this the case ? Thanks, Vincent __

[Moses-support] UN V1.0 corpus / Europarl - first shot... EN=>FR

2016-05-30 Thread Vincent Nguyen

First, many thanks for the huge work. open some new languages possibilities not in the europarl. I just made one test comparison : Config 1: Corpus UN v1.0 LM : UN V1.0 + News2014FR DEV+TEST=Newsdiscuss2015 Nist=29.61 Config 2: Corpus Europarl LM : Europarl + News2014FR DEV+TEST=Newsdiscuss2015

Re: [Moses-support] loading time for large LMs

2016-04-10 Thread Vincent Nguyen

SSD drive ? if not, then forget it. try cat > NULL Le 10/04/2016 08:29, Jorg Tiedemann a écrit : Hi, I have a large language model from the common crawl data set and it takes forever to load when running moses. My model is a trigram kenlm binarized with quantization, trie structures and poin

Re: [Moses-support] language models options

2016-04-06 Thread Vincent Nguyen

size of phrase tables and language models matter, too, but not as much, and it seems that in your scenario you are just wondering about splitting up a fixed pool of data. -phi On Wed, Apr 6, 2016 at 6:50 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi, What are (in t

[Moses-support] language models options

2016-04-06 Thread Vincent Nguyen

Hi, What are (in terms of performance) the difference between the 3 following solutions : 2 corpus, 2 LM, 2 weights calculated at tuning time 2 corpus merged into one, 1 LM 2 corpus, 2 LM interpolated into 1 LM with tuning Will the results be different in the end ? thanks. __

Re: [Moses-support] Translating words with apostrophies

2016-04-03 Thread Vincent Nguyen

Apostrophe is tricky to handle properly the tokenizer is language sensitive (see -l option) in French : l'été => l' été [with a space between ; and é] in English : today's story => today 's story BUT the issue is sometime in corpora you will find some misplaced spaces before or after the apostr

Re: [Moses-support] Maximum Phrase Table length

2016-04-01 Thread Vincent Nguyen

hu, Mar 31, 2016 at 2:58 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hello, Does someone have some support to this (found in the doc) : Maximum Phrase Length The maximum length of phrases is limited to 7 words. The maximum phrase length impacts the

[Moses-support] Maximum Phrase Table length

2016-03-31 Thread Vincent Nguyen

Hello, Does someone have some support to this (found in the doc) : Maximum Phrase Length The maximum length of phrases is limited to 7 words. The maximum phrase length impacts the size of the phrase translation table, so shorter limits may be desirable, if phrase table size is an issue. Previo

[Moses-support] reordering issue

2016-03-21 Thread Vincent Nguyen

Hi, I have been fighting with some reordering issues. I have tried both LM interpolation and OSM but with no luck. Here is an example Source English : Canada remains very active within the Working Group, and our law enforcement officials also participate in the Working Group’s informal law enf

[Moses-support] Job Opportunity

2016-03-19 Thread Vincent Nguyen

Ubiqus is a leading Transcription / Translation company with offices in Paris, NY, London, Brussels, Montreal, Ottawa ... We are looking for a Machine Learning system builder having worked either with Kaldi, Moses, or any DNN framework for NLP. See the full story here : https://www.linkedin.c

Re: [Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-14 Thread Vincent Nguyen

after a full re-train I confirm what I was saying. For those who need to use French as one of the language the adjustment is really needed in normalize-punctuation.perl Le 14/03/2016 10:01, Vincent Nguyen a écrit : I think I found the culprit. this is very tricky . it's

Re: [Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-14 Thread Vincent Nguyen

t;. You can check the raw output of the decoder, and see how it is changed by the detokenizer. -phi On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi, I got the following situation: This group age is translated sometimes in: ce gr

Re: [Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-10 Thread Vincent Nguyen

raw output of the decoder, and see how it is changed by the detokenizer. -phi On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi, I got the following situation: This group age is translated sometimes in: ce groupe d'âge (corr

[Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-09 Thread Vincent Nguyen

Hi, I got the following situation: This group age is translated sometimes in: ce groupe d'âge (correct) ce groupe d" âge (incorrect) ce groupe d "âge (incorrect) I am wondering if this is more a detokenizer issue or a corpus issue, or both. Technically in French, there shouldn't be any space b

[Moses-support] philosophical question ....NMT/SMT

2016-03-08 Thread Vincent Nguyen

Guys, I got a question to the mathematicians that you all are :) I have been working and testing Moses as well as Groundhog for months now. When I compare results (when comparability is possible, using same corpus, in-domain, blablabla, ...) I do not see much difference in both systems. So whe

Re: [Moses-support] bleu-annotation / analysis.perl

2016-03-05 Thread Vincent Nguyen

is is still not right for unigram sentences. ____ De : "Vincent Nguyen" Date : 26 févr. 2016 22:21:59 A : moses-support@mit.edu <mailto:moses-support@mit.edu> Sujet : Re: [Moses-support] bleu-annotation / analysis.perl Am I correct sa

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-28 Thread Vincent Nguyen

threads running Le 28/02/2016 09:57, Marcin Junczys-Dowmunt a écrit : You are right, that's seems to be a mistake. "-threads" should not be specified twice. Anyone speaks EMS? W dniu 28.02.2016 o 09:51, Vincent Nguyen pisze: Marcin, (or others since it relates to EMS..

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-28 Thread Vincent Nguyen

t should however be somewhat faster than only a single thread. On 17.02.2016 22:44, Vincent Nguyen wrote: I have the feeling it's not. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___

Re: [Moses-support] bleu-annotation / analysis.perl

2016-02-26 Thread Vincent Nguyen

Am I correct saying that when sentences length is less or equal to 4 tokens then the BLEU score should be 1 for exact matches and 0 when not exact match ? (by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf) Le 26/02/2016 10:02, Vincent Nguyen a écrit : > Hi, > > I woul

[Moses-support] bleu-annotation / analysis.perl

2016-02-26 Thread Vincent Nguyen

Hi, I would like to understand better the analysis.perl script that generates the bleu-annotation file. Is there an easy way to get the uncased bleu score of each line instead of the cased calculation ? Am I right that this script recompute its own Bleu score without calling the Nist-Bleu nor Mul

Re: [Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-18 Thread Vincent Nguyen

3:07, Marcin Junczys-Dowmunt wrote: >> It is, just not very well done. It generally does not make sense to have >> more than 8-10 threads. That should however be somewhat faster than only >> a single thread. >> >> On 17.02.2016 22

[Moses-support] Is ProcessLexicalTableMin multi threads ?

2016-02-17 Thread Vincent Nguyen

I have the feeling it's not. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Vincent Nguyen

did you add -exec at the end (behind -continue 1) ? Le 08/01/2016 18:16, Nicholas Ruiz a écrit : > Thanks, Tomasz. Unfortunately modifying the config file in the steps > directory didn't work for me. My block looks something like this: > > [EVALUATION:test4] > > tokenized-input = /path/to/test4.

Re: [Moses-support] How much tuning data?

2015-12-28 Thread Vincent Nguyen

this is fine for tuning. if you want to make it quicker, drag it down to 1000 sentences. Le 28/12/2015 16:37, Read, James C a écrit : Hi, I'm setting up some Moses baseline systems for various language pairs to compare the systems against my own work. I've largely been following the base

Re: [Moses-support] easy steps for beginners

2015-12-11 Thread Vincent Nguyen

You managed to install it, so you will need a little efforts to learn basics by yourself here is the starting point : http://www.statmt.org/moses/?n=Moses.Baseline Le 10/12/2015 19:03, Shaimaa Marzouk a écrit : > Dear support team, > > I would be extremely grateful, if you could help me with th

Re: [Moses-support] decoder question

2015-12-05 Thread Vincent Nguyen

either CRLF or LF, which we have extensively > using across Windows and Posix systems. > > Tom > > > On 12/5/2015 6:13 AM, moses-support-requ...@mit.edu wrote: >> Date: Fri, 4 Dec 2015 23:13:10 + >> From: Ulrich Germann >> Subject: Re: [Moses-support] decoder qu

Re: [Moses-support] decoder question

2015-12-04 Thread Vincent Nguyen

n I have the feeling that we really need to "sentence-tokenize" first before word-tokenizing. Le 04/12/2015 13:52, John D Burger a écrit : > I think you're asking if Moses translates one sentence at a time. The answer > is yes. > > - John Burger >MITRE &g

[Moses-support] decoder question

2015-12-04 Thread Vincent Nguyen

Actually I don't know if this is a decoder question or such. Here is my issue Let's say I have a text string with 2 sentences, with a period ending the first sentence, but no CR+LF, just a space before the second sentence. When I pass the full string to the pipe : tokenizer + truecaser + moses

[Moses-support] normalize punctuation

2015-12-01 Thread Vincent Nguyen

Hieu, here : http://www.statmt.org/moses/RELEASE-3.0/models/fr-en/config.pb.recase I read : input-tokenizer = "$moses-script-dir/tokenizer/normalize-punctuation.perl $input-extension | $moses-script-dir/tokenizer/tokenizer.perl -a -l $input-extension" output-tokenizer = "$moses-script-dir/toke

[Moses-support] Language model question

2015-11-26 Thread Vincent Nguyen

Hi all, I have a question regarding LMs. Let's take the example of news.2014.shuffle.en When we process it through punctuation normalization for english language, it will for instance put a " " before an apostrophe "it is'nt" = > "it is 'nt" BUT it contains some noise, for instance there is so

Re: [Moses-support] Moses on SGE clarification

2015-11-04 Thread Vincent Nguyen

relative paths. And of course, the binaries need to be executable on all nodes as well. -phi On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: OK guys, not an easy stuff ... I fought to get the prerequisites working but but now at least j

[Moses-support] EMS suggestion

2015-11-04 Thread Vincent Nguyen

Hi, Since this option : Online Translation Model Combination (Multimodel phrase table type) is available cf : http://www.statmt.org/moses/?n=Advanced.Domain Why EMS wouldn't treat Translation Models the same way as Language Models ? When we keep running EMS is re-run a lot of stuff that could

Re: [Moses-support] Moses on SGE clarification

2015-10-30 Thread Vincent Nguyen

igure out : How does Moses steps deal with "Nb of Jobs submitted" versus -threads in the various steps ? Le 29/10/2015 17:45, Vincent Nguyen a écrit : > Ken, > > I just did some further testing on the master node that HAS all installed. > same error as is. > > /net

Re: [Moses-support] Moses on SAMBA filesystem

2015-10-29 Thread Vincent Nguyen

tuning now so working fine so far btw, in SMB there was another issue with the split command in extraction. Le 29/10/2015 21:44, Vincent Nguyen a écrit : > I'll mount NFS instead and will confirm if working. > thanks > > Le 29/10/2015 21:31, Kenneth Heafield a écrit : >>

Re: [Moses-support] Moses on SAMBA filesystem

2015-10-29 Thread Vincent Nguyen

t temporary files on SAMBA is pretty low > priority. However, if you can provide a backtrace (after compiling with > "debug" added to the command) I can try to turn that segfault into an > error message. > > Kenneth > > On 10/29/2015 08:15 PM, Vincent Nguyen wrote

Re: [Moses-support] Moses on SGE clarification

2015-10-29 Thread Vincent Nguyen

27;re clear, it runs correctly on the local machine but not when you > run it through SGE? In that case, I suspect it's library version > differences. > > On 10/29/2015 03:09 PM, Vincent Nguyen wrote: >> I get this error : >> >> moses@sgenode1:/netshr/working-e

Re: [Moses-support] Moses on SGE clarification

2015-10-29 Thread Vincent Nguyen

. -phi On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi there, I need some clarification before screwing up some files. I just setup a SGE cluster with a Master + 2 Nodes. to make it clear let say my cluster name is "default&q

Re: [Moses-support] Moses on SGE clarification

2015-10-29 Thread Vincent Nguyen

) Le 29/10/2015 15:18, Philipp Koehn a écrit : Hi, make sure that all the paths are valid on all the nodes --- so definitely no relative paths. And of course, the binaries need to be executable on all nodes as well. -phi On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <mailto:vngu...@neuf

[Moses-support] Moses on SGE clarification

2015-10-28 Thread Vincent Nguyen

Hi there, I need some clarification before screwing up some files. I just setup a SGE cluster with a Master + 2 Nodes. to make it clear let say my cluster name is "default", my master headnode is "master", my 2 other nodes are "node1" and "node2" for EMS : I opened the default experiment.mac

[Moses-support] NPLM with Europarl ?

2015-10-21 Thread Vincent Nguyen

Hi, Before spending obviously a lot of machine time in this, I would like to know if someone ran EMS with NPLM on Europarl (ie European languages duh ...) and if so, what are the results in potential BLEU improvements. alternatively, I spent some time in ASR and saw some major improvements WER

[Moses-support] tokenizer / detokenizer

2015-10-12 Thread Vincent Nguyen

Hello, Pretty sure there is no academic importance to this, but : For the tokenizer we have the -x option to skip XML/HTML tags For the detokenizer it WILL SKIP whatever. cf : while() { if (/^<.+>$/ || /^\s*$/) { #don't try to detokenize XML/HTML tag lines

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Vincent Nguyen

o BLEU/TER/Meteor but this is just one data point and a fairly simple system. I would be curious to see how things work out in other users' systems. Best, Michael On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: out of curiosity, what gain do y

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-08 Thread Vincent Nguyen

Michael, what score-setting do you use to achieve these results ? if search algo= 1 what cube pruning number ? Le 08/10/2015 19:05, Michael Denkowski a écrit : Hi all, I extended the multi_moses.py script to support multi-threaded moses instances for cases where memory limits the number of dec

Re: [Moses-support] Faster decoding with multiple moses instances

2015-10-05 Thread Vincent Nguyen

After many tests, as mentioned before I had made these changes in EMS score-settings = "--GoodTuring --MinScore 2:0.001" and pop limit cube pruning at 400 (instead of 5000 in EMS ) speed is much much higher (without impact on translation) Le 05/10/2015 17:20, Philipp Koehn a écrit : Hi, w

Re: [Moses-support] truecase.perl

2015-09-26 Thread Vincent Nguyen

actually after > space is always inserted, but before < never inserted. Le 26/09/2015 16:37, Vincent Nguyen a écrit : > Hello, > > Quick question regarding this script behavior. > > Les Banques de la zone Euro sont soumises à : > > becomes > > les banque

[Moses-support] truecase.perl

2015-09-26 Thread Vincent Nguyen

Hello, Quick question regarding this script behavior. Les Banques de la zone Euro sont soumises à : becomes les banques de la zone euro sont soumises à : lowercasing is fine the space between >Les is fine but it did not insert a space between the after the : in : any clue ? Vincent

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen

a, you can > try modified Moore-Lewis filtering for data selection. > https://aclweb.org/anthology/D/D11/D11-1033.pdf > > > Cheers, > Matthias > > > On Thu, 2015-09-24 at 18:19 +0200, Vincent Nguyen wrote: >> This is an interesting subject .. >> >&g

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen

ta it'll be other bad translation > options which pop up. > > On Thu, 2015-09-24 at 16:08 +0200, Vincent Nguyen wrote: >> Matthias, >> >> Pruning : >> I use the cube pop limit at 400 instead of default values (1000 or 5000) >> I use the MinScore 0.001 &g

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen

7;t want to be used: >> " 1 ||| One Million Roofs >> >> oui ||| no >> >> To use this list, add the following to your moses.ini file >> >> [feature] >> DeleteRules path=/path/to/list >> >> Not tested. >> >>

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-24 Thread Vincent Nguyen

ros ||| by EUR 1.1 billion ||| 0.0345062 6.98053e-05 0.0517593 0.000791519 ||| 3-1 4-1 1-2 2-2 2-3 ||| 3 2 1 ||| ||| Le 24/09/2015 09:54, Felipe Sánchez Martínez a écrit : Hi, This is quite common. If you look at the scores, they are pretty low when they do not make sense, so, even though

Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-23 Thread Vincent Nguyen

tries, isn't it better to address the root of the problem and prepare your training corpus better? On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu wrote: Date: Tue, 22 Sep 2015 20:24:02 +0200 From: Philipp Koehn Subject: Re: [Moses-support] is there a way to remove a bad entry in

[Moses-support] is there a way to remove a bad entry in the phrase table ?

2015-09-22 Thread Vincent Nguyen

Hi, I was wondering if after an analysis of the BLEU-Annotation file we realize that there must be a bad entry in the phrase table, we could remove it manually or in some other ways ? Gracias. V. ___ Moses-support mailing list Moses-support@mit.edu htt

Re: [Moses-support] Help on pipeline ....

2015-09-17 Thread Vincent Nguyen

aware ..... big debate ? Le 16/09/2015 17:30, Vincent Nguyen a écrit : I am struggling with a pipeline . Here is the text1.txt file I would like to translate from FR to EN Les banques de la zone euro sont soumises : au ratio de capital lié à la détention d’actifs risqués (nous nous in

[Moses-support] Help on pipeline ....

2015-09-16 Thread Vincent Nguyen

I am struggling with a pipeline . Here is the text1.txt file I would like to translate from FR to EN Les banques de la zone euro sont soumises : au ratio de capital lié à la détention d’actifs risqués (nous nous intéressons ici au crédit) ; au ratio de levier, qui détermine le capital règle

[Moses-support] analysis.perl / mteval-v13a.pl / BLEU-annotation

2015-09-14 Thread Vincent Nguyen

Guys, While running EMS with a big test file I realized that the analysis.perl was executed very quickly while the actual Nist-Bleu was much much longer. Also one thing is that the file "BLEU-Annotation" generated during analysis does not contain the right line numbering. it takes 0 as the firs

Re: [Moses-support] sgm generation for personalized test sets

2015-09-13 Thread Vincent Nguyen

dle these cases? > > > > On 9/13/2015 11:01 PM, moses-support-requ...@mit.edu wrote: >> Date: Sun, 13 Sep 2015 10:44:02 +0200 >> From: Vincent Nguyen >> Subject: Re: [Moses-support] sgm generation for personalized test sets >> To: moses-support >> Message

Re: [Moses-support] sgm generation for personalized test sets

2015-09-13 Thread Vincent Nguyen

in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt files. python handles them as additional line breakers. Le 12/09/2015 22:07, Vincent Nguyen a écrit : > Hi, > > What script do you guys use to generate sgm sets based on txt file ? > > I have tried makemteva

[Moses-support] sgm generation for personalized test sets

2015-09-12 Thread Vincent Nguyen

Hi, What script do you guys use to generate sgm sets based on txt file ? I have tried makemteval.py in contrib but there are a few issues. I think these lines: lines = [l.replace('"','\"').replace(''','\'').replace('>','>').replace('<','<').replace('&','&') for l in filein.read().splitlines()

[Moses-support] Incremental / combination theory question

2015-09-07 Thread Vincent Nguyen

Hi experts, I have a question about the phrase table theory. If we take a corpus A to create a TM model TMA and a LM model LMA. if we consider a corpus B. Method 1 : We add corpus B to A => corpus AB => TM-AB and LM-AB Method 2: We process corpus B => TMB and LMB then we combine TMA + TMB and

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-09-04 Thread Vincent Nguyen

ug 31, 2015 at 10:33 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: is there any benchmark on what value / what impact ? what should I start with as a test 0.001 ? the standard value 0.0001 seems really really low to me maybe I am not getting what this probability

[Moses-support] Several Issues with Baseline and EMS

2015-09-02 Thread Vincent Nguyen

if you're new to linux you will fight for ever. I would probably go to Slate instead for sure. Le 02/09/2015 17:34, Anita Pal a écrit : For the time being, I'm trying to finish building the baseline system. I've just been following the commands as listed on the Moses website. It's still not

Re: [Moses-support] really weird phrase table crash .....

2015-09-02 Thread Vincent Nguyen

Le 01/09/2015 17:41, Christophe Servan a écrit : > Hello Vincent, > Did you checked whether you have enough disk space? > > Best, > > Christophe > > > -Message d'origine- > De : moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] De > l

[Moses-support] Translation Model binarizing step in EMS - multicore ?

2015-09-02 Thread Vincent Nguyen

Hi, Unless I am mistaken, it seems that binarizing the TM step in EMS in not multi core. ttable-binarizer = "$moses-bin-dir/processPhraseTableMin" [training] training-options = "-mgiza -mgiza-cpus 8 -sort-compress gzip -sort-parallel 4 -cores 4" binarize-all = $moses-script-dir/training/bina

Re: [Moses-support] clarification CBPT vs MMSAPT

2015-09-01 Thread Vincent Nguyen

n, uncompressed text files. - Uli On Tue, Sep 1, 2015 at 1:11 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi Uli, For your point3. here is what I would like to do / understand : I have an LM and a TM built with EMS but alignment being done by FastAlign. So th

Re: [Moses-support] really weird phrase table crash .....

2015-09-01 Thread Vincent Nguyen

yes plenty. Le 01/09/2015 17:41, Christophe Servan a écrit : > Hello Vincent, > Did you checked whether you have enough disk space? > > Best, > > Christophe > > > -Message d'origine- > De : moses-support-boun...@mit.edu [mailto:moses-support-boun.

[Moses-support] really weird phrase table crash .....

2015-09-01 Thread Vincent Nguyen

Hi, I don't know what is happening, but during the phrase table building (inverse part) in the ../model/tmp.23625 directory I have plenty of files : but 4 .coc files are missing (number 14 , 15 , 23, 24 don't know why) and then when putting all together it crashes because can't find these 4 phra

Re: [Moses-support] clarification CBPT vs MMSAPT

2015-09-01 Thread Vincent Nguyen

ell. 3. Can you briefly explain what you are trying to accomplish? I don't think I understand what you are actually trying to do. Best regards - Uli On Sat, Aug 22, 2015 at 10:45 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: I kept reading again and again this h

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-09-01 Thread Vincent Nguyen

emove-orphan-phrase-pairs-from-reordering-table.perl -phi On Mon, Aug 31, 2015 at 10:50 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: thanks, will try and post results. just to be clear: I can re-use the previous extract file I have to rebuild the phrase-table with

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-08-31 Thread Vincent Nguyen

: hI, 0.0001 should have no impact on translation quality, 0.001 will have some impact 0.01 is probably a bit too drastic. But that's the range you should explore. -phi On Mon, Aug 31, 2015 at 10:33 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: is there any benchmark

Re: [Moses-support] Decoding Speed perfomance - suggestion and question

2015-08-31 Thread Vincent Nguyen

nScore 2:0.0001" in EMS. -phi On Mon, Aug 31, 2015 at 3:03 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: Hi, Here are some results with several values with cube pruning pop limit : (pop limit / decoding time for 3000 sentences / BLEU score) 5000 -

[Moses-support] Decoding Speed perfomance - suggestion and question

2015-08-31 Thread Vincent Nguyen

Hi, Here are some results with several values with cube pruning pop limit : (pop limit / decoding time for 3000 sentences / BLEU score) 5000 - 15m45 - 29.59 1000 - 4m27 - 29.59 500 - 3m35 - 29.59 200 - 3m15 - 29.51 100 - 3m00 - 29.40 Therefore I took 400 - 3m19 - 29.58 If I am not mistaken the

Re: [Moses-support] MMSAPT in EMS questions

2015-08-27 Thread Vincent Nguyen

ng using MMSapt: - EMS includes the mmsapt option to train and binarize the arrays - EMS does NOT include the part of incrementally adding the new data in an automated way. Has to be done manually. Am I understanding things properly ? Le 23/08/2015 09:06, Vincent Nguyen a écrit : > Hello

Re: [Moses-support] removed OutputPassthroughInformation by mistake ?

2015-08-25 Thread Vincent Nguyen

la can tell you more about it. I am not familiar with the other parts of code. —Prashant On Aug 25, 2015, at 11:02 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote: well 2 things : - I still don't see any of the methods OutputPassthroughInformation in the previous versi

Re: [Moses-support] removed OutputPassthroughInformation by mistake ?

2015-08-25 Thread Vincent Nguyen

10:35, Prashant Mathur a écrit : Hi Vincent, Forgot to tell you that the adaptive MT server works with Moses Release 1.0 There is another version on github which works with the latest version. Try this out. https://github.com/hlt-mt/adaptiveMT —Prashant On Aug 25, 2015, at 9:39 AM, Vi

[Moses-support] removed OutputPassthroughInformation by mistake ?

2015-08-25 Thread Vincent Nguyen

Guys, I tried the mt adaptive server package from Matecat and I am fighting for the past 3 days but I think now I know why. the mt adaptive application uses some undocumented "-print-passthrough" option in moses. then I saw some functions to actually Output the passthrough info to STDOUT in I

[Moses-support] MMSAPT in EMS questions

2015-08-23 Thread Vincent Nguyen

Hello, I have a few questions on running MMSAPT within EMS. I am refering to the doc here : http://www.statmt.org/moses/?n=Advanced.Incremental and to the sections of the config.basic file of EMS. 1) the doc says initial training run EMS as usual but use modified version of Giza++ and add trai

Re: [Moses-support] clarification CBPT vs MMSAPT

2015-08-20 Thread Vincent Nguyen

[1] http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf <http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf> [2] http://mt4cat.org/software/adaptive-mt-server On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen

[Moses-support] clarification CBPT vs MMSAPT

2015-08-19 Thread Vincent Nguyen

Hello support, Going into advanced features of Moses, I am a bit confused by the differences and therefore which path to follow, regarding the 2 features CBPT and MMSAPT. I have the feeling the ultimate goal of both is the same but maybe I am wrong. Can someone explain the actual difference ?

Re: [Moses-support] sigtest filtering reordering

2015-08-19 Thread Vincent Nguyen

-entries.perl (someting like that, I am writing this from memory.). You give the pruned phrase-table and the unpruned reordering model to the script, and the script takes care that the contents match. The good thing is, is hardly requires any RAM. Best, Marcin W dniu 2015-08-19 13:44, Vincent Nguyen

[Moses-support] sigtest filtering reordering

2015-08-19 Thread Vincent Nguyen

Hi, it crashed (whereas the sigtest filetring ttable continues ...) and no message for disk space nor out of memory. just a simple "killed" at the end of the stderr, any clue ? -l = a+e P(f|e) filter limit: 50 Loading Vocabulary... Loading existing vocabulary file: /home/moses/working/train

Re: [Moses-support] OSM in EMS error

2015-08-17 Thread Vincent Nguyen

actually, that's my fault. Fixed https://github.com/moses-smt/mosesdecoder/commit/3a261c9fc95667eb43311c61ea9b7de3b293af6f On 16/08/2015 20:02, Vincent Nguyen wrote: right but the config file is the config.basic from which I uncommented the 3 lines for OSM. So I guess the parameters are redu

Re: [Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen

16/08/2015 20:02, Vincent Nguyen wrote: right but the config file is the config.basic from which I uncommented the 3 lines for OSM. So I guess the parameters are redundant with what is in the perl script. which one to keep ? either way there is something to correct in the github. Le 16/08/20

Re: [Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen

7;s a double declaration of -S when running lmplz. That's either a mistake in the config file or in the script On 16/08/2015 14:11, Vincent Nguyen wrote: the build-osm crashes in EMS with following error any clue ? 23396000 23397000 23398000 23399000 2340Converting Bilingual Sentence

Re: [Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen

had to guess, you ran out of disk space. Can you find the stderr > of lmplz? > > Kenneth > > On 08/16/2015 11:11 AM, Vincent Nguyen wrote: >> the build-osm crashes in EMS with following error >> any clue ? >> >> 23396000 23397000 23398000 23399000 2340

[Moses-support] OSM in EMS error

2015-08-16 Thread Vincent Nguyen

the build-osm crashes in EMS with following error any clue ? 23396000 23397000 23398000 23399000 2340Converting Bilingual Sentence Pair into Operation Corpus Executing: /home/moses/mosesdecoder/bin/generateSequences /home/moses/working/model/OSM.2//e /home/moses/working/model/OSM.2//f /ho

Re: [Moses-support] Domain adaptation

2015-08-14 Thread Vincent Nguyen

han just concatenating all the data you have. > > best wishes, > Rico > > > On 14/08/15 16:22, Vincent Nguyen wrote: >> Hi, >> >> I can't find a sort of "tutorial " on domain adaptation path to follow. >> I read this in the doc : >> The l

[Moses-support] Domain adaptation

2015-08-14 Thread Vincent Nguyen

Hi, I can't find a sort of "tutorial " on domain adaptation path to follow. I read this in the doc : The language model should be trained on a corpus that is suitable to the domain. If the translation model is trained on a parallel corpus, then the language model should be trained on the output

[Moses-support] Easiest way to tune with several data sets ?

2015-08-12 Thread Vincent Nguyen

Hi, I am wondering if I could get better results with a larger tuning data set. Is there a way in EMS to cumulate several data set files or do I need to concatenate sets. is last option, how can I do this easily ? just concat the sgm files ? thanks, Vincent ___

1 2 >

1 - 100 of 141 matches

Mail list logo