tr -d -c '\r' < news-commentary-v12.de-en.en | wc -c
4099
so v12 is broken somehow when reading it with some tools / primitive,
but it works with some others.
Just to let you know.
Le 14/09/2017 à 08:48, Vincent Nguyen a écrit :
> okay really weird.
> wc gives me the same number
Dear all,
In case one would like a good excuse to visit Paris March 2-3 2018,
there will be a workshop on OpenNMT.
Here is the registration website.
http://workshop-paris-2018.opennmt.net/
Cheers,
Vincent
___
Moses-support mailing list
Moses-suppo
nano give also the "right" number 270769 but I got some script which
find a difference.
Le 14/09/2017 à 08:48, Vincent Nguyen a écrit :
> okay really weird.
> wc gives me the same numbers as you, but gedit give another 2 different
> numbers for each file. Must be special c
*
>> 270769 news-commentary-v12.de-en.de
>> 270769 news-commentary-v12.de-en.en
>> 541538 total
>
> What are you running that shows you different line numbers?
>
> cheers - Barry
>
> On 12/09/17 10:06, Vincent Nguyen wrote:
>> Hi,
>> Is there an
Hi,
Is there an updated version of NCv12 for this
http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
the number of lines for de-en is not the same in the 2 languages.
Cheers,
Vincent
___
Moses-support mailing list
Moses-support@
Hello team,
I have read many post and it looks like most people tend to use the
Stanford segmenter.
Do you have some good experience with other tools ?
Also, what "detokenizer" do you actually use. It seems, that it is not
just a question of removing space, especially when Chinese target
cont
I think you mixed up input/ouput because in your example at the end, you
would like to get pronunciation of a given new word.
input is the left hand side and output is the pron.
If you are able to rework a little bit the right hand side of your data
(you need to stretch the phones one by one,
Hi Michael,
Trying to check if you're tests on this subject were successful or not,
can you follow up ?
thanks
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
re de-duping, and before we
> didn't.
>
> I would say if you want to compare to recent WMT experiments, take the
> most recent version of the data,
>
> cheers - Barry
>
> On 04/10/16 21:34, Vincent Nguyen wrote:
>>
>> ok
>> this one http://www.statmt.o
sed files?
>
> cheers - Barry
>
> On 04/10/16 14:40, Vincent Nguyen wrote:
>> Hi,
>>
>> on this link:
>>
>> http://www.statmt.org/wmt11/translation-task.html
>>
>> on the download section for monolingual data, there is :
>>
>> on
Hi,
on this link:
http://www.statmt.org/wmt11/translation-task.html
on the download section for monolingual data, there is :
one big file : http://www.statmt.org/wmt11/training-monolingual.tgz
And separate files, of which news crawls per year.
However, when you take a single file for a specif
2016 at 9:57 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi,
I have a basic question on EMS.
If I want no recasing and no truecasing, I just put IGNORE next to
the 2
sections.
However I have the feeling it does not eliminate this step for the
EVALUAT
Hi,
I have a basic question on EMS.
If I want no recasing and no truecasing, I just put IGNORE next to the 2
sections.
However I have the feeling it does not eliminate this step for the
EVALUATION step, and there is no ignore within this one.
Is this the case ?
Thanks,
Vincent
__
First, many thanks for the huge work. open some new languages
possibilities not in the europarl.
I just made one test comparison :
Config 1:
Corpus UN v1.0
LM : UN V1.0 + News2014FR
DEV+TEST=Newsdiscuss2015
Nist=29.61
Config 2:
Corpus Europarl
LM : Europarl + News2014FR
DEV+TEST=Newsdiscuss2015
SSD drive ? if not, then forget it.
try cat > NULL
Le 10/04/2016 08:29, Jorg Tiedemann a écrit :
Hi,
I have a large language model from the common crawl data set and it
takes forever to load when running moses.
My model is a trigram kenlm binarized with quantization, trie
structures and poin
size of phrase tables and language models matter, too, but not
as much, and it seems that in your scenario you are just wondering
about splitting up a fixed pool of data.
-phi
On Wed, Apr 6, 2016 at 6:50 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi,
What are (in t
Hi,
What are (in terms of performance) the difference between the 3
following solutions :
2 corpus, 2 LM, 2 weights calculated at tuning time
2 corpus merged into one, 1 LM
2 corpus, 2 LM interpolated into 1 LM with tuning
Will the results be different in the end ?
thanks.
__
Apostrophe is tricky to handle properly
the tokenizer is language sensitive (see -l option)
in French :
l'été => l' été [with a space between ; and é]
in English :
today's story => today 's story
BUT
the issue is sometime in corpora you will find some misplaced spaces
before or after the apostr
hu, Mar 31, 2016 at 2:58 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hello,
Does someone have some support to this (found in the doc) :
Maximum Phrase Length
The maximum length of phrases is limited to 7 words. The maximum
phrase
length impacts the
Hello,
Does someone have some support to this (found in the doc) :
Maximum Phrase Length
The maximum length of phrases is limited to 7 words. The maximum phrase
length impacts the size of the phrase translation table, so shorter
limits may be desirable, if phrase table size is an issue. Previo
Hi,
I have been fighting with some reordering issues.
I have tried both LM interpolation and OSM but with no luck.
Here is an example
Source English :
Canada remains very active within the Working Group, and our law
enforcement officials also participate in the Working Group’s informal
law enf
Ubiqus is a leading Transcription / Translation company with offices in
Paris, NY, London, Brussels, Montreal, Ottawa ...
We are looking for a Machine Learning system builder having worked
either with Kaldi, Moses, or any DNN framework for NLP.
See the full story here :
https://www.linkedin.c
after a full re-train I confirm what I was saying. For those who need to
use French as one of the language the adjustment is really needed in
normalize-punctuation.perl
Le 14/03/2016 10:01, Vincent Nguyen a écrit :
I think I found the culprit.
this is very tricky . it's
t;.
You can check the raw output of the decoder, and see how it is
changed by the detokenizer.
-phi
On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi,
I got the following situation:
This group age
is translated sometimes in:
ce gr
raw output of the decoder, and see how it is
changed by the detokenizer.
-phi
On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi,
I got the following situation:
This group age
is translated sometimes in:
ce groupe d'âge (corr
Hi,
I got the following situation:
This group age
is translated sometimes in:
ce groupe d'âge (correct)
ce groupe d" âge (incorrect)
ce groupe d "âge (incorrect)
I am wondering if this is more a detokenizer issue or a corpus issue, or
both.
Technically in French, there shouldn't be any space b
Guys,
I got a question to the mathematicians that you all are :)
I have been working and testing Moses as well as Groundhog for months now.
When I compare results (when comparability is possible, using same
corpus, in-domain, blablabla, ...) I do not see much difference in both
systems.
So whe
is is still not right for unigram sentences.
____
De : "Vincent Nguyen"
Date : 26 févr. 2016 22:21:59
A : moses-support@mit.edu <mailto:moses-support@mit.edu>
Sujet : Re: [Moses-support] bleu-annotation / analysis.perl
Am I correct sa
threads running
Le 28/02/2016 09:57, Marcin Junczys-Dowmunt a écrit :
You are right, that's seems to be a mistake. "-threads" should not be
specified twice. Anyone speaks EMS?
W dniu 28.02.2016 o 09:51, Vincent Nguyen pisze:
Marcin, (or others since it relates to EMS..
t should however be somewhat faster than only
a single thread.
On 17.02.2016 22:44, Vincent Nguyen wrote:
I have the feeling it's not.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
___
Am I correct saying that when sentences length is less or equal to 4
tokens then the BLEU score should be 1 for exact matches and 0 when not
exact match ?
(by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf)
Le 26/02/2016 10:02, Vincent Nguyen a écrit :
> Hi,
>
> I woul
Hi,
I would like to understand better the analysis.perl script that
generates the bleu-annotation file.
Is there an easy way to get the uncased bleu score of each line instead
of the cased calculation ?
Am I right that this script recompute its own Bleu score without calling
the Nist-Bleu nor Mul
3:07, Marcin Junczys-Dowmunt wrote:
>> It is, just not very well done. It generally does not make sense to have
>> more than 8-10 threads. That should however be somewhat faster than only
>> a single thread.
>>
>> On 17.02.2016 22
I have the feeling it's not.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
did you add -exec at the end (behind -continue 1) ?
Le 08/01/2016 18:16, Nicholas Ruiz a écrit :
> Thanks, Tomasz. Unfortunately modifying the config file in the steps
> directory didn't work for me. My block looks something like this:
>
> [EVALUATION:test4]
>
> tokenized-input = /path/to/test4.
this is fine for tuning. if you want to make it quicker, drag it down to
1000 sentences.
Le 28/12/2015 16:37, Read, James C a écrit :
Hi,
I'm setting up some Moses baseline systems for various language pairs
to compare the systems against my own work. I've largely been
following the base
You managed to install it, so you will need a little efforts to learn
basics by yourself
here is the starting point :
http://www.statmt.org/moses/?n=Moses.Baseline
Le 10/12/2015 19:03, Shaimaa Marzouk a écrit :
> Dear support team,
>
> I would be extremely grateful, if you could help me with th
either CRLF or LF, which we have extensively
> using across Windows and Posix systems.
>
> Tom
>
>
> On 12/5/2015 6:13 AM, moses-support-requ...@mit.edu wrote:
>> Date: Fri, 4 Dec 2015 23:13:10 +
>> From: Ulrich Germann
>> Subject: Re: [Moses-support] decoder qu
n I have the feeling that we really need to
"sentence-tokenize" first before word-tokenizing.
Le 04/12/2015 13:52, John D Burger a écrit :
> I think you're asking if Moses translates one sentence at a time. The answer
> is yes.
>
> - John Burger
>MITRE
&g
Actually I don't know if this is a decoder question or such.
Here is my issue
Let's say I have a text string with 2 sentences, with a period ending
the first sentence, but no CR+LF, just a space before the second sentence.
When I pass the full string to the pipe :
tokenizer + truecaser + moses
Hieu,
here :
http://www.statmt.org/moses/RELEASE-3.0/models/fr-en/config.pb.recase
I read :
input-tokenizer = "$moses-script-dir/tokenizer/normalize-punctuation.perl
$input-extension | $moses-script-dir/tokenizer/tokenizer.perl -a -l
$input-extension"
output-tokenizer = "$moses-script-dir/toke
Hi all,
I have a question regarding LMs.
Let's take the example of news.2014.shuffle.en
When we process it through punctuation normalization for english
language, it will for instance put a " " before an apostrophe
"it is'nt" = > "it is 'nt"
BUT it contains some noise, for instance there is so
relative paths.
And of course, the binaries need to be executable on all nodes as well.
-phi
On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
OK guys, not an easy stuff ...
I fought to get the prerequisites working but but now at least
j
Hi,
Since this option : Online Translation Model Combination (Multimodel
phrase table type) is available cf :
http://www.statmt.org/moses/?n=Advanced.Domain
Why EMS wouldn't treat Translation Models the same way as Language Models ?
When we keep running EMS is re-run a lot of stuff that could
igure out :
How does Moses steps deal with "Nb of Jobs submitted" versus -threads in
the various steps ?
Le 29/10/2015 17:45, Vincent Nguyen a écrit :
> Ken,
>
> I just did some further testing on the master node that HAS all installed.
> same error as is.
>
> /net
tuning now
so working fine so far
btw, in SMB there was another issue with the split command in extraction.
Le 29/10/2015 21:44, Vincent Nguyen a écrit :
> I'll mount NFS instead and will confirm if working.
> thanks
>
> Le 29/10/2015 21:31, Kenneth Heafield a écrit :
>>
t temporary files on SAMBA is pretty low
> priority. However, if you can provide a backtrace (after compiling with
> "debug" added to the command) I can try to turn that segfault into an
> error message.
>
> Kenneth
>
> On 10/29/2015 08:15 PM, Vincent Nguyen wrote
27;re clear, it runs correctly on the local machine but not when you
> run it through SGE? In that case, I suspect it's library version
> differences.
>
> On 10/29/2015 03:09 PM, Vincent Nguyen wrote:
>> I get this error :
>>
>> moses@sgenode1:/netshr/working-e
.
-phi
On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi there,
I need some clarification before screwing up some files.
I just setup a SGE cluster with a Master + 2 Nodes.
to make it clear let say my cluster name is "default&q
)
Le 29/10/2015 15:18, Philipp Koehn a écrit :
Hi,
make sure that all the paths are valid on all the nodes --- so
definitely no relative paths.
And of course, the binaries need to be executable on all nodes as well.
-phi
On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <mailto:vngu...@neuf
Hi there,
I need some clarification before screwing up some files.
I just setup a SGE cluster with a Master + 2 Nodes.
to make it clear let say my cluster name is "default", my master
headnode is "master", my 2 other nodes are "node1" and "node2"
for EMS :
I opened the default experiment.mac
Hi,
Before spending obviously a lot of machine time in this, I would like to
know if someone ran EMS with NPLM on Europarl
(ie European languages duh ...)
and if so, what are the results in potential BLEU improvements.
alternatively, I spent some time in ASR and saw some major improvements
WER
Hello,
Pretty sure there is no academic importance to this, but :
For the tokenizer we have the -x option to skip XML/HTML tags
For the detokenizer it WILL SKIP whatever.
cf :
while() {
if (/^<.+>$/ || /^\s*$/) {
#don't try to detokenize XML/HTML tag lines
o BLEU/TER/Meteor but this is just one
data point and a fairly simple system. I would be curious to see how
things work out in other users' systems.
Best,
Michael
On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
out of curiosity, what gain do y
Michael,
what score-setting do you use to achieve these results ?
if search algo= 1 what cube pruning number ?
Le 08/10/2015 19:05, Michael Denkowski a écrit :
Hi all,
I extended the multi_moses.py script to support multi-threaded moses
instances for cases where memory limits the number of dec
After many tests, as mentioned before I had made these changes in EMS
score-settings = "--GoodTuring --MinScore 2:0.001"
and
pop limit cube pruning at 400 (instead of 5000 in EMS )
speed is much much higher (without impact on translation)
Le 05/10/2015 17:20, Philipp Koehn a écrit :
Hi,
w
actually after > space is always inserted, but before < never inserted.
Le 26/09/2015 16:37, Vincent Nguyen a écrit :
> Hello,
>
> Quick question regarding this script behavior.
>
> Les Banques de la zone Euro sont soumises à :
>
> becomes
>
> les banque
Hello,
Quick question regarding this script behavior.
Les Banques de la zone Euro sont soumises à :
becomes
les banques de la zone euro sont soumises à :
lowercasing is fine
the space between >Les is fine
but it did not insert a space between the after the : in :
any clue ?
Vincent
a, you can
> try modified Moore-Lewis filtering for data selection.
> https://aclweb.org/anthology/D/D11/D11-1033.pdf
>
>
> Cheers,
> Matthias
>
>
> On Thu, 2015-09-24 at 18:19 +0200, Vincent Nguyen wrote:
>> This is an interesting subject ..
>>
>&g
ta it'll be other bad translation
> options which pop up.
>
> On Thu, 2015-09-24 at 16:08 +0200, Vincent Nguyen wrote:
>> Matthias,
>>
>> Pruning :
>> I use the cube pop limit at 400 instead of default values (1000 or 5000)
>> I use the MinScore 0.001
&g
7;t want to be used:
>> " 1 ||| One Million Roofs
>>
>> oui ||| no
>>
>> To use this list, add the following to your moses.ini file
>>
>> [feature]
>> DeleteRules path=/path/to/list
>>
>> Not tested.
>>
>>
ros ||| by EUR 1.1 billion ||| 0.0345062
6.98053e-05 0.0517593 0.000791519 ||| 3-1 4-1 1-2 2-2 2-3 ||| 3 2 1 ||| |||
Le 24/09/2015 09:54, Felipe Sánchez Martínez a écrit :
Hi,
This is quite common. If you look at the scores, they are pretty low
when they do not make sense, so, even though
tries, isn't it better to address the root of the
problem and prepare your training corpus better?
On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu wrote:
Date: Tue, 22 Sep 2015 20:24:02 +0200
From: Philipp Koehn
Subject: Re: [Moses-support] is there a way to remove a bad entry in
Hi,
I was wondering if after an analysis of the BLEU-Annotation file we
realize that there must be a bad entry in the phrase table,
we could remove it manually or in some other ways ?
Gracias.
V.
___
Moses-support mailing list
Moses-support@mit.edu
htt
aware .....
big debate ?
Le 16/09/2015 17:30, Vincent Nguyen a écrit :
I am struggling with a pipeline .
Here is the text1.txt file I would like to translate from FR to EN
Les banques de la zone euro sont soumises :
au ratio de capital lié à la détention d’actifs risqués
(nous nous in
I am struggling with a pipeline .
Here is the text1.txt file I would like to translate from FR to EN
Les banques de la zone euro sont soumises :
au ratio de capital lié à la détention d’actifs risqués (nous
nous intéressons ici au crédit) ;
au ratio de levier, qui détermine le capital règle
Guys,
While running EMS with a big test file I realized that the analysis.perl
was executed very quickly while the actual Nist-Bleu was much much longer.
Also one thing is that the file "BLEU-Annotation" generated during
analysis does not contain the right line numbering.
it takes 0 as the firs
dle these cases?
>
>
>
> On 9/13/2015 11:01 PM, moses-support-requ...@mit.edu wrote:
>> Date: Sun, 13 Sep 2015 10:44:02 +0200
>> From: Vincent Nguyen
>> Subject: Re: [Moses-support] sgm generation for personalized test sets
>> To: moses-support
>> Message
in order to use makemteval.py we need to remove 0D and E2 80 A8 from txt
files.
python handles them as additional line breakers.
Le 12/09/2015 22:07, Vincent Nguyen a écrit :
> Hi,
>
> What script do you guys use to generate sgm sets based on txt file ?
>
> I have tried makemteva
Hi,
What script do you guys use to generate sgm sets based on txt file ?
I have tried makemteval.py in contrib
but there are a few issues.
I think these lines:
lines =
[l.replace('"','\"').replace(''','\'').replace('>','>').replace('<','<').replace('&','&')
for l in filein.read().splitlines()
Hi experts,
I have a question about the phrase table theory.
If we take a corpus A to create a TM model TMA and a LM model LMA.
if we consider a corpus B.
Method 1 :
We add corpus B to A => corpus AB => TM-AB and LM-AB
Method 2:
We process corpus B => TMB and LMB
then we combine TMA + TMB and
ug 31, 2015 at 10:33 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
is there any benchmark on what value / what impact ?
what should I start with as a test 0.001 ?
the standard value 0.0001 seems really really low to me
maybe I am not getting what this probability
if you're new to linux you will fight for ever.
I would probably go to Slate instead for sure.
Le 02/09/2015 17:34, Anita Pal a écrit :
For the time being, I'm trying to finish building the baseline system.
I've just been following the commands as listed on the Moses website.
It's still not
Le 01/09/2015 17:41, Christophe Servan a écrit :
> Hello Vincent,
> Did you checked whether you have enough disk space?
>
> Best,
>
> Christophe
>
>
> -Message d'origine-
> De : moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] De
> l
Hi,
Unless I am mistaken, it seems that binarizing the TM step in EMS in not
multi core.
ttable-binarizer = "$moses-bin-dir/processPhraseTableMin"
[training]
training-options = "-mgiza -mgiza-cpus 8 -sort-compress gzip
-sort-parallel 4 -cores 4"
binarize-all = $moses-script-dir/training/bina
n, uncompressed text files.
- Uli
On Tue, Sep 1, 2015 at 1:11 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi Uli,
For your point3. here is what I would like to do / understand :
I have an LM and a TM built with EMS but alignment being done by
FastAlign. So th
yes plenty.
Le 01/09/2015 17:41, Christophe Servan a écrit :
> Hello Vincent,
> Did you checked whether you have enough disk space?
>
> Best,
>
> Christophe
>
>
> -Message d'origine-
> De : moses-support-boun...@mit.edu [mailto:moses-support-boun.
Hi,
I don't know what is happening, but during the phrase table building
(inverse part)
in the ../model/tmp.23625 directory I have plenty of files :
but 4 .coc files are missing (number 14 , 15 , 23, 24 don't know why)
and then when putting all together it crashes because can't find these 4
phra
ell.
3. Can you briefly explain what you are trying to accomplish? I don't
think I understand what you are actually trying to do.
Best regards - Uli
On Sat, Aug 22, 2015 at 10:45 PM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
I kept reading again and again this
h
emove-orphan-phrase-pairs-from-reordering-table.perl
-phi
On Mon, Aug 31, 2015 at 10:50 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
thanks, will try and post results.
just to be clear:
I can re-use the previous extract file
I have to rebuild the phrase-table with
:
hI,
0.0001 should have no impact on translation quality,
0.001 will have some impact
0.01 is probably a bit too drastic.
But that's the range you should explore.
-phi
On Mon, Aug 31, 2015 at 10:33 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
is there any benchmark
nScore 2:0.0001"
in EMS.
-phi
On Mon, Aug 31, 2015 at 3:03 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
Hi,
Here are some results with several values with cube pruning pop
limit :
(pop limit / decoding time for 3000 sentences / BLEU score)
5000 -
Hi,
Here are some results with several values with cube pruning pop limit :
(pop limit / decoding time for 3000 sentences / BLEU score)
5000 - 15m45 - 29.59
1000 - 4m27 - 29.59
500 - 3m35 - 29.59
200 - 3m15 - 29.51
100 - 3m00 - 29.40
Therefore I took 400 - 3m19 - 29.58
If I am not mistaken the
ng using MMSapt:
- EMS includes the mmsapt option to train and binarize the arrays
- EMS does NOT include the part of incrementally adding the new data in
an automated way. Has to be done manually.
Am I understanding things properly ?
Le 23/08/2015 09:06, Vincent Nguyen a écrit :
> Hello
la can tell you more
about it. I am not familiar with the other parts of code.
—Prashant
On Aug 25, 2015, at 11:02 AM, Vincent Nguyen <mailto:vngu...@neuf.fr>> wrote:
well 2 things :
- I still don't see any of the methods OutputPassthroughInformation
in the previous versi
10:35, Prashant Mathur a écrit :
Hi Vincent,
Forgot to tell you that the adaptive MT server works with Moses
Release 1.0
There is another version on github which works with the latest
version. Try this out.
https://github.com/hlt-mt/adaptiveMT
—Prashant
On Aug 25, 2015, at 9:39 AM, Vi
Guys,
I tried the mt adaptive server package from Matecat and I am fighting
for the past 3 days but I think now I know why.
the mt adaptive application uses some undocumented "-print-passthrough"
option in moses.
then I saw some functions to actually Output the passthrough info to
STDOUT in I
Hello,
I have a few questions on running MMSAPT within EMS. I am refering to
the doc here : http://www.statmt.org/moses/?n=Advanced.Incremental
and to the sections of the config.basic file of EMS.
1) the doc says
initial training run EMS as usual but use modified version of Giza++ and
add trai
[1]
http://www.cl.uni-heidelberg.de/~riezler/publications/papers/MTJOURNAL2014.pdf
<http://www.cl.uni-heidelberg.de/%7Eriezler/publications/papers/MTJOURNAL2014.pdf>
[2] http://mt4cat.org/software/adaptive-mt-server
On Wed, Aug 19, 2015 at 6:53 PM, Vincent Nguyen
Hello support,
Going into advanced features of Moses, I am a bit confused by the
differences and therefore which path to follow, regarding the 2 features
CBPT and MMSAPT.
I have the feeling the ultimate goal of both is the same but maybe I am
wrong.
Can someone explain the actual difference ?
-entries.perl (someting like that, I am
writing this from memory.). You give the pruned phrase-table and the
unpruned reordering model to the script, and the script takes care
that the contents match. The good thing is, is hardly requires any RAM.
Best,
Marcin
W dniu 2015-08-19 13:44, Vincent Nguyen
Hi,
it crashed (whereas the sigtest filetring ttable continues ...) and no
message for disk space nor out of memory.
just a simple "killed" at the end of the stderr, any clue ?
-l = a+e
P(f|e) filter limit: 50
Loading Vocabulary...
Loading existing vocabulary file:
/home/moses/working/train
actually, that's my fault. Fixed
https://github.com/moses-smt/mosesdecoder/commit/3a261c9fc95667eb43311c61ea9b7de3b293af6f
On 16/08/2015 20:02, Vincent Nguyen wrote:
right but the config file is the config.basic from which I
uncommented the 3 lines for OSM.
So I guess the parameters are redu
16/08/2015 20:02, Vincent Nguyen wrote:
right but the config file is the config.basic from which I
uncommented the 3 lines for OSM.
So I guess the parameters are redundant with what is in the perl script.
which one to keep ? either way there is something to correct in the
github.
Le 16/08/20
7;s a double declaration of -S when running lmplz. That's either a
mistake in the config file or in the script
On 16/08/2015 14:11, Vincent Nguyen wrote:
the build-osm crashes in EMS with following error
any clue ?
23396000 23397000 23398000 23399000 2340Converting Bilingual
Sentence
had to guess, you ran out of disk space. Can you find the stderr
> of lmplz?
>
> Kenneth
>
> On 08/16/2015 11:11 AM, Vincent Nguyen wrote:
>> the build-osm crashes in EMS with following error
>> any clue ?
>>
>> 23396000 23397000 23398000 23399000 2340
the build-osm crashes in EMS with following error
any clue ?
23396000 23397000 23398000 23399000 2340Converting Bilingual
Sentence Pair into Operation Corpus
Executing: /home/moses/mosesdecoder/bin/generateSequences
/home/moses/working/model/OSM.2//e /home/moses/working/model/OSM.2//f
/ho
han just concatenating all the data you have.
>
> best wishes,
> Rico
>
>
> On 14/08/15 16:22, Vincent Nguyen wrote:
>> Hi,
>>
>> I can't find a sort of "tutorial " on domain adaptation path to follow.
>> I read this in the doc :
>> The l
Hi,
I can't find a sort of "tutorial " on domain adaptation path to follow.
I read this in the doc :
The language model should be trained on a corpus that is suitable to the
domain. If the translation model is trained on a parallel corpus, then
the language model should be trained on the output
Hi,
I am wondering if I could get better results with a larger tuning data set.
Is there a way in EMS to cumulate several data set files or do I need to
concatenate sets.
is last option, how can I do this easily ? just concat the sgm files ?
thanks,
Vincent
___
1 - 100 of 141 matches
Mail list logo