[Moses-support] Warning during tokenizing Urdu Corpus

2013-12-27 Thread Asad A.Malik
Hi All, I am trying to develop Urdu SMT using MOSES. I have Urdu parallel corpus and the 1st step in manual is to tokenize the corpus, but when I enter following command: ~/SMT/mosesdecoder/scripts/tokenizer/tokenizer.perl -l ur < ~/SMT/corpus/training/mycorpus.ur-en.ur > ~/SMT/corpus/mycorpus

[Moses-support] Does Moses support C++11 compilation?

2013-12-27 Thread Li Xiang
Hi, Does Moses support C++11 compilation? Because I want to integrate my code which is base on C++11 into Moses. How to modify the bjam config file to compile Moses using C++11? Thanks. -- Xiang Li ___ Moses-support mailing list Moses-support@mit.edu h

[Moses-support] Call for Papers: 9th SaLTMiL workshop on “Free/open-source language resources for the machine translation of less-resourced languages” at LREC 2014.

2013-12-27 Thread Mikel Forcada
Call for Papers: 9th SaLTMiL workshop on “Free/open-source language resources for the machine translation of less-resourced languages” at LREC 2014. A full-day workshop at LREC 2014 Tuesday, 27 May 2014. Reykjavik (Iceland) SALTMIL: http://ixa2.si.ehu.es/saltmil/ LREC 2014: http://lrec2014.lrec

[Moses-support] 1st CfP: LREC 2014 Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools

2013-12-27 Thread OSACT
We apologize for multiple postings, Please distribute to interested colleagues -- 1st Call for Papers WORKSHOP ON Free/Open-Source Arabic Corpora and Corpora Processing Tools http://www.kacstac.org.sa/osact/index.html May 27, 2014 Co-located wi

Re: [Moses-support] Warning during tokenizing Urdu Corpus

2013-12-27 Thread John D. Burger
The default tokenizer script only knows specific rules for a few languages. The fallback (English) rules may suffice for your purposes, they do the obvious thing with spaces and English punctuation, and also handle some special cases for abbreviations like "Mr." and "Mrs.". I'd suggest you eye

Re: [Moses-support] Warning during tokenizing Urdu Corpus

2013-12-27 Thread Hieu Hoang
The output will be tokenized, but probably very badly. If you know Urdu and can create a better tokenizer, please add it to Moses. You can start by looking at the configuration file for the English tokenizer in scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en You can copy that and chang

Re: [Moses-support] Moses-support Digest, Vol 86, Issue 76

2013-12-27 Thread Arththika Paramanathan
support C++11 compilation? > Because I want to integrate my code which is base on C++11 into Moses. > How to modify the bjam config file to compile Moses using C++11? > Thanks. > > -- > Xiang Li > -- next part -- > An HTML attachment was scrubbed... >

Re: [Moses-support] Warning during tokenizing Urdu Corpus

2013-12-27 Thread Asad
And what about truecaser and cleaning??? Will I have to create that also for urdu? Regards Asad A.Malik Sent from my iPod On Dec 27, 2013, at 9:07 PM, Hieu Hoang wrote: > The output will be tokenized, but probably very badly. If you know Urdu and > can create a better tokenizer, please add i

Re: [Moses-support] Warning during tokenizing Urdu Corpus

2013-12-27 Thread Hieu Hoang
nope, just the tokenizer On 27 December 2013 18:21, Asad wrote: > And what about truecaser and cleaning??? Will I have to create that also > for urdu? > > Regards > Asad A.Malik > > Sent from my iPod > > On Dec 27, 2013, at 9:07 PM, Hieu Hoang wrote: > > The output will be tokenized, but proba

Re: [Moses-support] Does Moses support C++11 compilation?

2013-12-27 Thread Hieu Hoang
from stack overflow: http://stackoverflow.com/questions/2887707/how-to-build-boost-with-c0x-support http://stackoverflow.com/questions/18452723/change-boost-build-jamfile-for-c11-support ./bjam ... cxxflags=-std=gnu++0x or bjam ... cxxflags="-std=c++11" On 27 December 2013 09:46, Li Xiang