Re: [Moses-support] training process and special unicode characters

2013-03-08 Thread Tomas Hudik
opp [mailto:achi...@gmail.com] Sent: Friday, March 08, 2013 5:27 PM To: Tomas Hudik; 'Barry Haddow' Cc: moses-support@mit.edu Subject: Re: [Moses-support] training process and special unicode characters Hi Tomas, Did you mean filtering out \p{Z} (Separator) and \p{C} (Other)? http://perldo

Re: [Moses-support] training process and special unicode characters

2013-03-08 Thread Tomas Hudik
ag it. Would be difficult to trace all changes in processing of Unicode by various perl versions, boost library and potentially some other components involved in moses workflow. Cheers, t. From: Barry Haddow [mailto:bhad...@staffmail.ed.ac.uk] Sent: Friday, March 08, 2013 11:30 AM To: Tomas

[Moses-support] training process and special unicode characters

2013-03-07 Thread Tomas Hudik
Hi, How is moses treating special Unicode control or white space characters? e.g. http://www.fileformat.info/info/unicode/char/2028/index.htm, or http://en.wikipedia.org/wiki/Unicode_control_characters Is it excluded or they become part of phrase table? Not sure if this question wouldn't be bette

Re: [Moses-support] Snt2cooc error - post re-formatted

2013-02-27 Thread Tomas Hudik
Hi Ken, You have a typo in your command: ...-root-dir train - corpus ~/corpus/news-commentary-v7.fr-en.clean... There should be -corpus (not white space between - and corpus). The same is true with: -alignment grow-diag- final-and (before word final) Also, moses in general, prefers absolute pat

Re: [Moses-support] REST API?

2013-02-11 Thread Tomas Hudik
Hi Tom, I don’t know much about moseserver XML-RPC (once we installed it – but we didn’t use it afterwards), but we find daemon (contrib/web/daemon.sh) very useful – it is simple perl script which runs moses and you can talk to and listen from moses via TCP port. Together with apache server it

[Moses-support] tuning, problem with compact phrase and distortion tables

2013-02-08 Thread Tomas Hudik
Hi there, I've trained Moses' engine and make compact phrase and reordering (distortion) table: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5 http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc6 and changed moses.ini accordingly. Moses decoder is working correctly (loading

[Moses-support] Compact Phrase Table -- documentation bug

2013-02-06 Thread Tomas Hudik
Hi there, Likely, I found a bug in Compact Phrase Table documentation: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5 in subsection Options: there is a line: -alignment-info -- include alignment info in the binary phrase table However, in fact it should be: -no-alignment-info -- do

[Moses-support] moses v 1.0 vs. source codes

2013-01-30 Thread Tomas Hudik
couldn't find any info about how the release was produced. If I took look at https://github.com/moses-smt/mosesdecoder/tree/RELEASE-1.0 It seems v 1.0 is created on older but stable sources - correct? Is there any "Changelog" what was changed/updated since v 1.0? Thanks! Tomas Hudik

[Moses-support] train-model.perl an useless char

2012-07-19 Thread Tomas Hudik
Hi there, Train-model.perl, line 1533 contains my $cmd = "$PHRASE_SCORE $extract $lexical_file.$direction' $ttable_file.half.$direction.gz $inverse"; character "'" looks like useless. It causes problem: (6.1) creating table half /home/moses/engines/cs/IBM/itm0/train2/model/phrase-table.hal

Re: [Moses-support] Placeholders missed

2012-07-10 Thread Tomas Hudik
Hi Henry, This answer is coming late probably, but: We have developed small sw for placeholder translation. It is under the same license as Moses. http://code.google.com/p/m4loc/ If you want to try it - download sources (it is perl mostly, so you do not need to compile it). The input should be tm

[Moses-support] Moses compiled on gcc-4.7.0

2012-06-28 Thread Tomas Hudik
Hi there, I've tried to compile moses on new gcc 4.7.0 There should be done 2 updates: In files util/file.cc and util/mmap.cc, line #include should be added otherwise error will occur: util/mmap.cc:33:18: error: '_SC_PAGE_SIZE' was not declared in this scope cheers, Tomas

Re: [Moses-support] compiling hunalign

2012-06-14 Thread Tomas Hudik
Hi Joerg, If Hieu was able to compile it on gcc 4.6.3 and you not. And your error message looks like problem with STL library; I'd try to check if the problem persists also after compatibility packages (e.g. compat-gcc-34) installation. (make sure what gcc was used in hunalign development) Che

[Moses-support] cooperation IRSTLM and train_recaser.perl

2012-05-31 Thread Tomas Hudik
Hi there, Command invoking IRSTLM involves option -t /tmp (train_recaser.perl 97) (what temporary dir should be used). Irstlm (build-lm.sh) is trying to remove it (rmdir) at the end. However, it would delete whole /tmp (if run under root) or, it fails and whole build-lm.sh fails as well (return

Re: [Moses-support] train-recaser.perl and new IRSTLM

2012-05-30 Thread Tomas Hudik
Hi, The discussion was moved to: https://list.fbk.eu/sympa/arc/user-irstlm/2012-05/msg1.html since it is related to IRSTLM tomas -Original Message- From: Nicola Bertoldi [mailto:berto...@fbk.eu] Sent: Wednesday, May 30, 2012 8:47 AM To: Tomas Hudik Cc: moses-support@mit.edu

Re: [Moses-support] EMS fails on tuning

2012-05-29 Thread Tomas Hudik
Hi Dimitris, Write the error log if you translate some sentence, try e.g.: echo "translate some sentence" ./moses -f your_moses.ini cheers, Tomas -Original Message- From: Δημήτρης Μπαμπανιώτης [mailto:dimbabanio...@gmail.com] Sent: Tuesday, May 29, 2012 11:41 PM To: Philipp Koehn Cc: mos

[Moses-support] train-recaser.perl and new IRSTLM

2012-05-29 Thread Tomas Hudik
Hi there, It seems newer versions of IRSTLM (build-lm.sh) ends up with exit code 1. But if build-lm.sh exits with something else than 0 train-recaser fails. It is due to the command: system($cmd) == 0 || die("Language model training failed with error " . ($? >> 8) . "\n"); which should be c

Re: [Moses-support] EMS fails on tuning

2012-05-28 Thread Tomas Hudik
Hi Dimitris, Make sure whether your moses is compiled correctly (no errors during bjam). If so, make sure your binaries are running without segmentation fault, if so, make sure your paths are absolute and valid. There is 90% likelihood that problems gonna be solved after these checks. Cheers, To

Re: [Moses-support] new PhraseExtractor -- segmentation faul; the old one run smooth

2012-05-26 Thread Tomas Hudik
ct-parallel and then to ln. The conclusion is - if a problem with path arises always try to put absolute paths to all possible parameters :) Cheers, Tomas From: Tomas Hudik [mailto:thu...@moraviaworldwide.com] Sent: Saturday, May 26, 2012 5:27 PM To: Hieu Hoang; moses-support@mit.edu Subject: Re:

Re: [Moses-support] new PhraseExtractor -- segmentation faul; the old one run smooth

2012-05-26 Thread Tomas Hudik
on are the links created? Wouldn't be better to work with original files? From: Hieu Hoang [mailto:fishandfrol...@gmail.com] Sent: Saturday, May 26, 2012 2:50 PM To: moses-support@mit.edu; Tomas Hudik Subject: Re: [Moses-support] new PhraseExtractor -- segmentation faul; the old one run smoo

Re: [Moses-support] new PhraseExtractor -- segmentation faul; the old one run smooth

2012-05-26 Thread Tomas Hudik
Hi Hieu, Thanks for your reply. Yep - I downloaded latest git version (de8a2e7667fe2bde9df0ef5a32b3b85b6469eb0f) and found out following problems: 1. During compilation; gcc version 4.6.3. produces error in: gcc.compile.c++ scripts/training/phrase-extract/pcfg-common/bin/gcc-4.6.3/r

[Moses-support] new PhraseExtractor -- segmentation faul; the old one run smooth

2012-05-25 Thread Tomas Hudik
Hi there, It looks like Phrase extractor have a bug: If I run command: training/phrase-extract/extract lotus.tok.rem.clean.lw.cs lotus.tok.rem.clean.lw.en ./model/aligned.grow-diag-final-and ./model/extract 7 orientation --model wbe-msd Within Moses from Git, version: 9b5a4278b7693ec361c1

Re: [Moses-support] Failed to install RandLM-0.2.5 on Fedora 12

2012-05-25 Thread Tomas Hudik
Hi Jun, Maybe there is no bug at all. This is happening quite often if you are trying to compile a new program on old system (or vice versa). Fedora 12 is having pretty old gcc which can cause a lot of strange messages if you compile some nowadays sources. Update fedora, or try to install RandLM

Re: [Moses-support] Meteor and Experiment.perl (ems)

2012-01-16 Thread Tomas Hudik
t any bugs cheers Lefteris On 13/01/12 08:48, Tomas Hudik wrote: > Hi Lefteris, > > This is a good point. I was also looking for meteor support, as some a bit > better eval technique, but I failed. I was thinking about adding it, but I > didn't due to lack of time. > > Cheer

Re: [Moses-support] Meteor and Experiment.perl (ems)

2012-01-12 Thread Tomas Hudik
Hi Lefteris, This is a good point. I was also looking for meteor support, as some a bit better eval technique, but I failed. I was thinking about adding it, but I didn't due to lack of time. Cheers, Tomas -Original Message- From: Eleftherios Avramidis [mailto:eleftherios.avrami...@df

Re: [Moses-support] XML Markup

2011-12-21 Thread Tomas Hudik
Hi Somayeh , Moses passes to LM precisely: 19 By default. However, this can be changed via special Moses feature -xml-input http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4 Be careful, Moses can process only text and this option doesn't support any valid XML code. It is a bit tri

[Moses-support] incorrectly created html page by sentence-by-sentence.pl script

2011-10-12 Thread Tomas Hudik
Hi, I've noticed that script sentence-by-sentence.pl (moses scripts/analysis) is even producing valid UTF output (html) but the created html header is non-valid. Line 125: print "\n"; should be: print "\n"; since I don't have a permission to Moses svn - pls. correct it. Thanks, Tomas _

[Moses-support] Fwd: build 5gram with irstlm

2011-09-06 Thread Tomas Hudik
Hi Cyrine, I think the problem is with the architecture. I had the same problem, it is a bit tricky but you need to tell the installation procedure that you have x64 architecture instead of i386. I think it was during make --configure, if I remember correctly ... Or, download IRSTLM binaries al

[Moses-support] -threads option and segmentation fail

2011-04-01 Thread Tomas Hudik
Hi there, When I run command : moses -threads 10 -v 4 -f moses.ini < oo > aa Defined parameters (per moses.ini or switch): config: moses.ini distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 /home/moses/reordering-table.wbe-msd-bidirectional-fe.gz distortion-limit: 6

[Moses-support] tokenizer.perl - fall-back to English version

2011-03-07 Thread Tomas Hudik
Hi all, I found a possible bug in tokenizer perl. If I run: echo "Don't put a space after the opening parenthesis" | ./tokenizer.perl -l en The output is correct: Don 't put a space after the opening parenthesis But if I run: echo "Don't put a space after the opening parenthesis" | ./tokenizer.

[Moses-support] tokenizer.perl vs. detokenizer.perl

2011-03-07 Thread Tomas Hudik
Hi all, I'd like to ask why Moses scripts - tokenizer.perl and detokenizer.perl are based on different approaches. While tokenizer.perl is acquiring rules for a specific language from special file stored in nonbreaking_prefixes directory, detokenizer.perl has these rules hardcoded inside and just

[Moses-support] filter-model-given-input.pl Use of uninitialized value, 191

2011-02-27 Thread Tomas Hudik
Hi all, I execute command: /home/moses/moses-scripts/mscripts/training/filter-model-given-input.pl hh3 ../train/model/moses.ini tok.rem.low The output was: Executing: mkdir -p /home/m/ll/tuning/hh3 Considering factor 0 Considering factor 0 Use of uninitialized value in concatenation (.) or string

[Moses-support] svn: mert-moses-new.pl is missing

2011-02-26 Thread Tomas Hudik
Hi all, a few days ago, I checked out the svn repository and I didn't find mert-moses-new.pl in scripts/training (old mert-moses.pl is still there). Was the new version deleted from the svn? Thanks, Tomas ___ Moses-support mailing list Moses-support@mit.

[Moses-support] list of special characters

2010-11-24 Thread Tomas Hudik
Hi there! I'm wondering whether exist some list of "special characters" for moses. I mean characters like "|", ... which can influence translation process. It would be nice to have such a list where each special character has some short description what can happen if the charactar is included in t

Re: [Moses-support] Proposal to replace vertical bar as factor delimeter

2010-11-16 Thread Tomas Hudik
Hi, Well, I also don't think "|" is a good choice for delimiter. And I agree that 0x00 (or any other special character) is a way to hell. However, I'd like to see Moses moves toward xml. What about (in some next bigger release) make the delimiter as some xml tag and generaly, also other things

Re: [Moses-support] moses support script

2010-10-11 Thread Tomas Hudik
Hi Jasleen, As far as I can see – boost library is missing (http://www.boost.org), however, there are listed some other warnings (errors), as well. Regards, Tomas From: Jasleen Sidhu [mailto:sidhuru...@yahoo.com] Sent: Monday, October 11, 2010 3:45 AM To: Moses-support@mit.edu Subject: [

[Moses-support] HW requirements

2010-09-29 Thread Tomas Hudik
Hi there! I'd like to know what are the HW requirements if I have corpus with 30-40M words. I've read http://www.statmt.org/moses/?n=Moses.FAQ#ntoc10 where it is written 2000 sentences take some 1-2 days with 15 CPUs. It seems to me too much. I built training engine 1M words (100 000sentences) an

[Moses-support] tokenizer for different languages

2010-09-15 Thread Tomas Hudik
Hi, I’ve got a question on script tokenizer.perl. I’m wondering whether is it possible to get somewhere nonbreaking_prefix.* for various languages. Does exist such a place? Or, how I can tokenize a text file if I don’t have enough knowledge about the particular language. Thanks, Tomas _

[Moses-support] configure bug

2010-08-10 Thread Tomas Hudik
Hi there, I've found a small bug in configure script. ./configure -help - says: --with-xmlrpc-c=PATH Enable XMLRPC-C support. ... In fact, it doesn't want PATH but string yes, or no. (path is found throughout xmlrpc-c-config) Regards, T. Hudik ___