opp [mailto:achi...@gmail.com]
Sent: Friday, March 08, 2013 5:27 PM
To: Tomas Hudik; 'Barry Haddow'
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] training process and special unicode characters
Hi Tomas,
Did you mean filtering out \p{Z} (Separator) and \p{C} (Other)?
http://perldo
ag it. Would be difficult
to trace all changes in processing of Unicode by various perl versions, boost
library and potentially some other components involved in moses workflow.
Cheers, t.
From: Barry Haddow [mailto:bhad...@staffmail.ed.ac.uk]
Sent: Friday, March 08, 2013 11:30 AM
To: Tomas
Hi,
How is moses treating special Unicode control or white space characters?
e.g. http://www.fileformat.info/info/unicode/char/2028/index.htm, or
http://en.wikipedia.org/wiki/Unicode_control_characters
Is it excluded or they become part of phrase table?
Not sure if this question wouldn't be bette
Hi Ken,
You have a typo in your command:
...-root-dir train - corpus ~/corpus/news-commentary-v7.fr-en.clean...
There should be -corpus (not white space between - and corpus). The same is
true with: -alignment grow-diag- final-and (before word final)
Also, moses in general, prefers absolute pat
Hi Tom,
I don’t know much about moseserver XML-RPC (once we installed it – but we
didn’t use it afterwards), but we find daemon (contrib/web/daemon.sh) very
useful – it is simple perl script which runs moses and you can talk to and
listen from moses via TCP port. Together with apache server it
Hi there,
I've trained Moses' engine and make compact phrase and reordering (distortion)
table:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc6
and changed moses.ini accordingly. Moses decoder is working correctly (loading
Hi there,
Likely, I found a bug in Compact Phrase Table documentation:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5
in subsection Options:
there is a line:
-alignment-info -- include alignment info in the binary phrase table
However, in fact it should be:
-no-alignment-info -- do
couldn't find any info about how the release was
produced.
If I took look at https://github.com/moses-smt/mosesdecoder/tree/RELEASE-1.0
It seems v 1.0 is created on older but stable sources - correct?
Is there any "Changelog" what was changed/updated since v 1.0?
Thanks!
Tomas Hudik
Hi there,
Train-model.perl, line 1533 contains
my $cmd = "$PHRASE_SCORE $extract $lexical_file.$direction'
$ttable_file.half.$direction.gz $inverse";
character "'" looks like useless. It causes problem:
(6.1) creating table half
/home/moses/engines/cs/IBM/itm0/train2/model/phrase-table.hal
Hi Henry,
This answer is coming late probably, but:
We have developed small sw for placeholder translation.
It is under the same license as Moses.
http://code.google.com/p/m4loc/
If you want to try it - download sources (it is perl mostly, so you do not need
to compile it). The input should be tm
Hi there,
I've tried to compile moses on new gcc 4.7.0
There should be done 2 updates:
In files util/file.cc and util/mmap.cc, line #include should be
added otherwise error will occur:
util/mmap.cc:33:18: error: '_SC_PAGE_SIZE' was not declared in this scope
cheers, Tomas
Hi Joerg,
If Hieu was able to compile it on gcc 4.6.3 and you not. And your error
message looks like problem with STL library; I'd try to check if the problem
persists also after compatibility packages (e.g. compat-gcc-34) installation.
(make sure what gcc was used in hunalign development)
Che
Hi there,
Command invoking IRSTLM involves option -t /tmp (train_recaser.perl 97)
(what temporary dir should be used).
Irstlm (build-lm.sh) is trying to remove it (rmdir) at the end. However, it
would delete whole /tmp (if run under root) or, it fails and whole build-lm.sh
fails as well (return
Hi,
The discussion was moved to:
https://list.fbk.eu/sympa/arc/user-irstlm/2012-05/msg1.html since it is
related to IRSTLM
tomas
-Original Message-
From: Nicola Bertoldi [mailto:berto...@fbk.eu]
Sent: Wednesday, May 30, 2012 8:47 AM
To: Tomas Hudik
Cc: moses-support@mit.edu
Hi Dimitris,
Write the error log if you translate some sentence, try e.g.:
echo "translate some sentence" ./moses -f your_moses.ini
cheers, Tomas
-Original Message-
From: Δημήτρης Μπαμπανιώτης [mailto:dimbabanio...@gmail.com]
Sent: Tuesday, May 29, 2012 11:41 PM
To: Philipp Koehn
Cc: mos
Hi there,
It seems newer versions of IRSTLM (build-lm.sh) ends up with exit code 1. But
if build-lm.sh exits with something else than 0 train-recaser fails.
It is due to the command:
system($cmd) == 0 || die("Language model training failed with error " . ($? >>
8) . "\n");
which should be c
Hi Dimitris,
Make sure whether your moses is compiled correctly (no errors during bjam). If
so, make sure your binaries are running without segmentation fault, if so, make
sure your paths are absolute and valid.
There is 90% likelihood that problems gonna be solved after these checks.
Cheers, To
ct-parallel and
then to ln.
The conclusion is - if a problem with path arises always try to put absolute
paths to all possible parameters :)
Cheers, Tomas
From: Tomas Hudik [mailto:thu...@moraviaworldwide.com]
Sent: Saturday, May 26, 2012 5:27 PM
To: Hieu Hoang; moses-support@mit.edu
Subject: Re:
on are the links created? Wouldn't be
better to work with original files?
From: Hieu Hoang [mailto:fishandfrol...@gmail.com]
Sent: Saturday, May 26, 2012 2:50 PM
To: moses-support@mit.edu; Tomas Hudik
Subject: Re: [Moses-support] new PhraseExtractor -- segmentation faul; the old
one run smoo
Hi Hieu,
Thanks for your reply. Yep - I downloaded latest git version
(de8a2e7667fe2bde9df0ef5a32b3b85b6469eb0f) and found out following problems:
1. During compilation; gcc version 4.6.3. produces error in:
gcc.compile.c++
scripts/training/phrase-extract/pcfg-common/bin/gcc-4.6.3/r
Hi there,
It looks like Phrase extractor have a bug:
If I run command:
training/phrase-extract/extract lotus.tok.rem.clean.lw.cs
lotus.tok.rem.clean.lw.en ./model/aligned.grow-diag-final-and ./model/extract 7
orientation --model wbe-msd
Within Moses from Git, version: 9b5a4278b7693ec361c1
Hi Jun,
Maybe there is no bug at all. This is happening quite often if you are trying
to compile a new program on old system (or vice versa).
Fedora 12 is having pretty old gcc which can cause a lot of strange messages if
you compile some nowadays sources.
Update fedora, or try to install RandLM
t any bugs
cheers
Lefteris
On 13/01/12 08:48, Tomas Hudik wrote:
> Hi Lefteris,
>
> This is a good point. I was also looking for meteor support, as some a bit
> better eval technique, but I failed. I was thinking about adding it, but I
> didn't due to lack of time.
>
> Cheer
Hi Lefteris,
This is a good point. I was also looking for meteor support, as some a bit
better eval technique, but I failed. I was thinking about adding it, but I
didn't due to lack of time.
Cheers, Tomas
-Original Message-
From: Eleftherios Avramidis [mailto:eleftherios.avrami...@df
Hi Somayeh ,
Moses passes to LM precisely:
19
By default. However, this can be changed via special Moses feature -xml-input
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4
Be careful, Moses can process only text and this option doesn't support any
valid XML code. It is a bit tri
Hi,
I've noticed that script sentence-by-sentence.pl (moses scripts/analysis) is
even producing valid UTF output (html) but the created html header is non-valid.
Line 125:
print "\n";
should be:
print "\n";
since I don't have a permission to Moses svn - pls. correct it.
Thanks, Tomas
_
Hi Cyrine,
I think the problem is with the architecture. I had the same problem, it is a
bit tricky but you need to tell the installation procedure that you have x64
architecture instead of i386. I think it was during make --configure, if I
remember correctly ... Or, download IRSTLM binaries al
Hi there,
When I run command :
moses -threads 10 -v 4 -f moses.ini < oo > aa
Defined parameters (per moses.ini or switch):
config: moses.ini
distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6
/home/moses/reordering-table.wbe-msd-bidirectional-fe.gz
distortion-limit: 6
Hi all,
I found a possible bug in tokenizer perl.
If I run:
echo "Don't put a space after the opening parenthesis" |
./tokenizer.perl -l en
The output is correct: Don 't put a space after the opening parenthesis
But if I run:
echo "Don't put a space after the opening parenthesis" |
./tokenizer.
Hi all,
I'd like to ask why Moses scripts - tokenizer.perl and detokenizer.perl
are based on different approaches. While tokenizer.perl is acquiring
rules for a specific language from special file stored in
nonbreaking_prefixes directory, detokenizer.perl has these rules
hardcoded inside and just
Hi all,
I execute command:
/home/moses/moses-scripts/mscripts/training/filter-model-given-input.pl
hh3 ../train/model/moses.ini tok.rem.low
The output was:
Executing: mkdir -p /home/m/ll/tuning/hh3
Considering factor 0
Considering factor 0
Use of uninitialized value in concatenation (.) or string
Hi all,
a few days ago, I checked out the svn repository and I didn't find
mert-moses-new.pl in scripts/training (old mert-moses.pl is still
there). Was the new version deleted from the svn?
Thanks, Tomas
___
Moses-support mailing list
Moses-support@mit.
Hi there!
I'm wondering whether exist some list of "special characters" for moses.
I mean characters like "|", ... which can influence translation process.
It would be nice to have such a list where each special character has
some short description what can happen if the charactar is included in
t
Hi,
Well, I also don't think "|" is a good choice for delimiter. And I agree that
0x00 (or any other special character) is a way to hell. However, I'd like to
see Moses moves toward xml. What about (in some next bigger release) make the
delimiter as some xml tag and generaly, also other things
Hi Jasleen,
As far as I can see – boost library is missing (http://www.boost.org), however,
there are listed some other warnings (errors), as well.
Regards, Tomas
From: Jasleen Sidhu [mailto:sidhuru...@yahoo.com]
Sent: Monday, October 11, 2010 3:45 AM
To: Moses-support@mit.edu
Subject: [
Hi there!
I'd like to know what are the HW requirements if I have corpus with
30-40M words.
I've read http://www.statmt.org/moses/?n=Moses.FAQ#ntoc10 where it is
written 2000 sentences take some 1-2 days with 15 CPUs. It seems to me
too much. I built training engine 1M words (100 000sentences) an
Hi,
I’ve got a question on script tokenizer.perl.
I’m wondering whether is it possible to get somewhere
nonbreaking_prefix.* for various languages. Does exist such a place?
Or, how I can tokenize a text file if I don’t have enough knowledge
about the particular language.
Thanks, Tomas
_
Hi there,
I've found a small bug in configure script.
./configure -help - says:
--with-xmlrpc-c=PATH Enable XMLRPC-C support.
...
In fact, it doesn't want PATH but string yes, or no. (path is found
throughout xmlrpc-c-config)
Regards, T. Hudik
___
38 matches
Mail list logo