penReview
<https://openreview.net/group?id=EMNLP/2023/Workshop/NLP-OSS>
ORGANIZERS
Geeticka Chauhan, Massachusetts Institute of Technology
Dmitrijs Milajevs, Grayscale AI
Elijah Rippeth, University of Maryland
Jeremy Gwinnup, Air Force Research Laboratory
Liling Tan, Amazon
___
Dear Moses Community and Devs,
This is an announcement on a related library that started as a Python port
of the tokenizer code in Moses decoder's script.
Initially, the Sacremoses library (which forked off NLTK's code)
inherited the LGPL license from Mosesdecoder. But I think it is
beneficial
Dear Moses Dev and Community,
Are there any example protected regex and related text that contains these
patterns that one can test with
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl
?
Regards,
Liling
___
sure that they don't mix it up with more permissive
license when redistributing code.
Regards,
Liling
On Wed, May 30, 2018 at 3:59 AM, liling tan wrote:
> Hi All,
>
> Multiple license for software components isn't impossible. If anyone tries
> to ship/package code with LGPL Moses tokenizer
kenizer and detokenizer as part of
> NLTK.
>
> Lane
>
>
> On Fri, Apr 20, 2018 at 12:38 AM, liling tan wrote:
>
>> Dear Moses Devs and Community,
>>
>> Sorry for the delayed response.
>>
>> We've repackaged the MosesTokenizer Python code as
,
Liling
On Wed, Apr 11, 2018 at 1:41 AM, Matt Post <p...@cs.jhu.edu> wrote:
> Seems worth a shot. I suggest contacting each of them with individual
> emails until (and if) you get a “no”.
>
> matt (from my phone)
>
> Le 10 avr. 2018 à 19:26, liling tan <alvati...@gmail.c
lvations] alvations <https://github.com/alvations>
Not sure if everyone agrees though.
Regards,
Liling
On Wed, Apr 11, 2018 at 12:39 AM, Matt Post <p...@cs.jhu.edu> wrote:
> Liling—Would it work to get the permission of just those people who are in
> the commit log of the specif
the same problem - everyone owns Moses so you need everyone's
> permission, not just mine. So no
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 10 April 2018 at 17:13, liling tan <alvati...@gmail.com> wrote:
>
>> I understand.
>>
>> Could we
, or dual license it, without the agreement of
> everyone who's contributed to Moses. Too much work
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 10 April 2018 at 15:47, liling tan <alvati...@gmail.com> wrote:
>
>> Dear Moses Dev,
>>
>> NLTK has a
Dear Moses Dev,
NLTK has a Python port of the word tokenizer in Moses. The tokenizer works
well in Python and create a good synergy to bridge Python users to the code
that Moses developers have spent years to hone.
But it seemed to have hit a wall with some licensing issues.
Dear Moses/Marian community and developers,
Sorry for cross-posting the issue.
We're hitting a wall in understanding how and what the mteval-v13.pl script
is doing when we tried to reimplement it in NLTK.
https://github.com/nltk/nltk/pull/1840
We'll be glad if anyone could help us to explain
Hoang <hieuho...@gmail.com> wrote:
> cool, I was expecting only single digits improvements. If the pt in Moses1
> hadn't been pruned, the speedup is a lot to do with the pruning i think
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 14 December 2017 at 07:41, liling t
=/home/ltan/momo/lm.ja.kenlm order=5
On Thu, Dec 14, 2017 at 8:58 AM, liling tan <alvati...@gmail.com> wrote:
> I don't have a comparison between moses vs moses2. I'll give some moses
> numbers once the full dataset is decoded. And I can repeat the decoding for
> moses on the same m
for moses v moses2? I never managed to get
> reliable info for more than 32 cores
>
> config/moses.ini files would be good too
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 13 December 2017 at 06:10, liling tan <alvati...@gmail.com> wrote:
>
>> Ah, tha
eTable from Moses (which bizarrely needs to load a
>> phrase table in order to parse the config file, last time I looked). You
>> could also apply Johnson / entropic pruning, whatever works for you,
>>
>> cheers - Barry
>>
>>
>> On
org/moses/?n=Site.Moses2
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 11 December 2017 at 09:20, liling tan <alvati...@gmail.com> wrote:
>
>> Dear Moses community/developers,
>>
>> I have a question on how to handle large models cre
Dear Moses community/developers,
I have a question on how to handle large models created using moses.
I've a vanilla phrase-based model with
- PhraseDictionary num-features=4 input-factor=0 output-factor=0
- LexicalReordering num-features=6 input-factor=0 output-factor=0
- KENLM
Dear Moses devs/community,
We are currently trying to reimplement NIST in Python and we found that our
re-implementation and the version in mteval-v14.pl is very different.
We would be glad if anyone who's familiar with MT evaluation or NIST script
help us out in determining how the "information
Dear Moses community,
When decoding, it's easy to use the moses.ini if the paths are fixed. But
if the model has to be moved around different machines / users, it's a
little messy to replace the paths to the correct ones.
Is it possible to set the paths in moses.ini when calling the moses binary
n)
Kenneth
---
On Mon, May 8, 2017 at 2:37 PM, liling tan <alvati...@gmail.com> wrote:
> Dear Moses Community,
>
> Does anyone know how to compute sentence perplexity with a KenLM model?
>
> Let's say we build a model on this:
>
> $ wget
> ht
Dear Moses Community,
Does anyone know how to compute sentence perplexity with a KenLM model?
Let's say we build a model on this:
$ wget
https://gist.githubusercontent.com/alvations/1c1b388456dc3760ffb487ce950712ac/raw/86cdf7de279a2b9bceeb3adb481e42691d12fbba/something.txt
$ lmplz -o 5 <
Thanks Kenneth for the answer!!
On Mon, Apr 24, 2017 at 2:48 PM, Kenneth Heafield <m...@kheafield.com> wrote:
> Yes. Though formally I would say in range.
>
> On April 24, 2017 4:18:41 AM GMT+01:00, liling tan <alvati...@gmail.com>
> wrote:
>>
>> Dear
Dear Moses community,
Is it correct that when using --discount_fallback, if discount is
computable from Kneyser-Ney, the fallback will not be used?
Regards,
Liling
For context:
$ cat test.zh
一 二 三 四
~$ time ~/mosesdecoder/bin/lmplz -o 3 --discount_fallback=0 < test.zh >
test.zh.arpa
Dear Marcin and Moses community,
Thanks for the tips!
Yeah, g2.8xlarge is painfully expensive... Training on separate instances
sounds more reasonable. Now, I've to explain to the devs why I need 2
instances ;P
Regards,
Liling
On Tue, Apr 4, 2017 at 4:25 PM, liling tan <alvati...@gmail.
Dear Marcin and Moses community,
Are you running on g2.8xlarge on AWS?
I think I went to the cheap g2.2xlarge and 15GB RAM is a little too low for
MGIZA++ , taking forever... I think I've got to recreate a new larger
instance.
Regards,
Liling
___
Dear Moses community,
Amittai had written a nice package and setup guide for Moses on AWS. But to
do some NMT on GPU, the instances wouldn't usually have enough RAM for
Moses.
Does anyone have experiencing deploying Moses on GPU instances on AWS or
any other cloud servers?
Regards,
Liling
whether
it's just that the newer perl version handles the \p{Line_Break}
automatically while the older perl doesn't
On Wed, Mar 29, 2017 at 12:36 AM, liling tan <alvati...@gmail.com> wrote:
> Hi Dingyuan, Hieu,
>
> Thanks for highlighting the issue.
>
> The deprecation wa
Hi Dingyuan, Hieu,
Thanks for highlighting the issue.
The deprecation warning from mteval has been there since early 2015 on
https://www.mail-archive.com/moses-support@mit.edu/msg12057.html
The fix at https://github.com/moses-smt/mosesdecoder/pull/170 was adhering
to unicode annex on
Dear Linguist / Machine Translation Experts / Machine Learning
Practitioners,
If you don't mind, we're polling on the question of "Which shade of BLEU
are you using?" for translation studies.
It would be great if you also cast a vote on it:
https://twitter.com/alvations/status/717647857768144896
Dear Matthias and Kenneth,
Thank you for the note on the --static options!
Regards,
Liling
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
Dear Moses devs/users,
The filter tool in KenLM is able to filter a LM based on a dev set (
https://kheafield.com/code/kenlm/filter/) but it only allows raw|arpa file.
Is there another tool that filters binarized LMs? Given a binarized LM, is
there a way to "debinarize" the LM?
Thanks in
Dear Moses devs/users,
I'm running into the same problem as
http://comments.gmane.org/gmane.comp.nlp.moses.user/14270 when installing
on a fresh instance of Ubuntu 14.04 LTS.
Following the installation steps on
http://www.statmt.org/moses/?n=Development.GetStarted:
$ sudo apt-get install
dy got 2 fully trained MT systems, and tunes a
> multi-model system that uses 2 TM, 2LM (n.b. but only one reordering model).
>
> Cheers,
> ~amittai
> www.boxresear.ch
>
>
> liling tan on February 29, 2016 at 10:33am wrote:
>
> Dear Moses devs/users,
>
> Is
Dear Moses devs/users,
Is the some documentation / tutorial on how to run "alternative decoding
paths" from Birch, Osborne, & Koehn (2007), i.e. from
http://homepages.inf.ed.ac.uk/abmayne/publications/acl2007ccg.pdf and
http://arxiv.org/pdf/1401.6876.pdf ?
After having two phrase-tables and two
Whoops, found it at http://matrix.statmt.org/test_sets/list
Thanks!
Regards,
Liling
On Mon, Jan 25, 2016 at 4:11 PM, liling tan <alvati...@gmail.com> wrote:
> Dear Moses devs/users,
>
> We're looking for the WMT07 test data (source + reference). On the
> http://ww
Dear Moses devs/users,
We're looking for the WMT07 test data (source + reference). On the
http://www.statmt.org/wmt07/results.html page, we could only find the
submissions and the judgment scores. And there's no link on the task page
too: http://www.statmt.org/wmt07/shared-task.html
We've also
Dear Moses / MT Marathon organizers,
I'm not sure whether this is the right place to report this.
I was trying to retrieve a page from MT Marathon 2010 and it seems like a
Russian hacker hacked the page and took over it:
http://www.mtmarathon2010.info/ (see the lower right corner).
And it's
ine link to lecture material from the MT Marathon.
>>
>>
>> Cheers,
>>
>> Ventzi
>>
>> –––
>> *Dr. Ventsislav Zhechev*
>> Computational Linguist, Certified ScrumMaster®
>>
>> *http://VentsislavZhechev.eu <http://ventsislavz
Dear Adam and Moses devs/users,
@Adam, Thank you for the explanation on the line 6 of the pseudo code. I
understand it better now.
I have a few more short questions about the pseudo code for the powell
search on slide 37 of http://mt-class.org/jhu/slides/lecture-tuning.pdf,
On line 6 does the
Dear Moses devs/users,
We are going through the slides for MT tuning on
http://mt-class.org/jhu/slides/lecture-tuning.pdf and we couldn't figure
out what does "λai + bi" on slide 31 refer to.
What are the values for "ai" and "bi"? Are they numbers from the nbest
list?
According to the
Dear Asad,
There're several way to add dictionary to the MT process.
You can either (i) add it passively as additional training data or (ii) use
the XML input (http://www.statmt.org/moses/?n=Advanced.Hybrid) during
decoding. I've done some experiments on that, you can take a look here for
a
Dear Moses devs/users,
Is there a way to control how many threads and how many cores Moses use?
`moses/scripts/training/train-model.perl` has a --cores option while
`moses/scripts//training/mert-moses.pl` has a --threads option.
Does the mert-moses.pl --thread option refer to the no. of
Dear Moses devs/users,
Is there any script in moses or other MT libraries that can generate the
segment level BLEU, NIST and METEOR scores for each sentence in the test
set?
Best Regards,
Liling
___
Moses-support mailing list
Moses-support@mit.edu
Dear Hieu,
Thanks for the info on the KenLM,
Regards,
Liling
On Tue, Aug 11, 2015 at 5:57 PM, Hieu Hoang hieuho...@gmail.com wrote:
On 10/08/2015 17:22, liling tan wrote:
Dear Moses devs/users,
@Marcin @Ken , Thanks for the tips on the -S for build_binary, RAM
estimation
to arpa dumping mechanism?
Regards,
LIling
On Fri, Aug 7, 2015 at 9:31 PM, liling tan alvati...@gmail.com wrote:
Dear Moses dev/users,
On a related note, without multi-threads, can anyone give a gauge of how
much RAM is required to binarized a 80GB (compressed .gz) 6gram arpa file
Dear Moses dev/users,
Is there multithread option for KenLM's build_binary?
Regards,
Liling
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
On Fri, Aug 7, 2015 at 8:56 PM, liling tan alvati...@gmail.com wrote:
Dear Moses dev/users,
Is there multithread option for KenLM's build_binary?
Regards,
Liling
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman
Dear James and Moses devs,
I guess everyone's hunch would be whether you've done tuning correctly
before the awkward results you've reported.
Read on for sob story about me and moses and some guide to sooth the tone,
try i may.
Please skip/delete, if you would not like to read the sob
I think Joerg Tiedemann's OPUS has cleaned, truecased, tokenized texts
ready for moses: http://opus.lingfil.uu.se/Europarl.php
You can also try this script to automatically download and process
Europarl:
https://github.com/alvations/usaarhat-repo/blob/master/Europarl-MT.md
Or directly download
(int, const void*, std::size_t)
threw FDException because `ret 1'.
No space left on device in /tmp/lmjYruEU (deleted) while writing 339929828
bytes
Regards,
Liling
On Sun, May 3, 2015 at 7:44 PM, liling tan alvati...@gmail.com wrote:
Dear Moses devs/users,
For now, I only know that it takes
@Marcin, thank you for the helpful insight. I guess i'll need to ask for
more HDD space from my supervisor =)
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
Dear Moses devs/users,
Does anyone have an idea how big would a 12-gram language model ARPA file
trained on 16GB of text become?
Any hints on what is the resulting size of the ARPA file?
Is there a way to measure how much space a language model take given the
training corpus size and the order
Dear Moses devs/users,
For now, I only know that it takes more than 250GB. I've 250GB of free
space and KenLM got poisoned by insufficient space...
Does anyone have an idea how big would a 12-gram language model ARPA file
trained on 16GB of text become?
STDERR:
=== 1/5 Counting and sorting
Dear Moses devs/users,
I want to use the XML-Input to add constraints when decoding (
http://www.statmt.org/moses/?n=Advanced.Hybrid#ntoc7)
The example on the Moses page shows only an example with one xml input. I
have 700,000 of those in a dictionary that I can search and replace using a
python
@Marcin, I've just made a merge from my branch into the master.
Could you point me to the fix and maybe i can try to merge it?
Regards,
Liling
On Mon, Apr 27, 2015 at 6:16 PM, moses-support-requ...@mit.edu wrote:
Send Moses-support mailing list submissions to
moses-support@mit.edu
@Marcin, oops wrong script file. Sorry about my previous post. Someone
else more appropriate should patch from your branch. I wasn't commiting
the mert-perl. it was the filter script before mert.
On Mon, Apr 27, 2015 at 7:09 PM, liling tan alvati...@gmail.com wrote:
@Marcin, I've just made
Dear Moses devs/users,
@Ken, I'm working with 128 RAM, the default binarized LM works but it's
kind of slow when tuning.
I've tried the trie and it's wonderful!! Effectively, it brought down the
size of the LM:
Text: 16GB
ARPA: 38G
Binary (no trie): 71GB
Trie Binary: 17GB
*Does the small trie
is the usual way to tune on a large LM file?*
@Marcin, how did you deal with the large LM file when tuning?
Regards,
Liling
On Tue, Apr 21, 2015 at 7:48 PM, liling tan alvati...@gmail.com wrote:
Dear Moses dev/users,
@Marcin, the bigger than usual reordering-table is due to our allowance
for high
Dear Moses devs/users,
@Marcin, thanks for the tip on the trie, I'll try out the trie.
About the 100 MERT iterations, when i tried to run mert-moses.pl on that
target language with 71GB of binarized language model on a 3000 line dev
set, it took more than one day to tune using 10 threads.
*Is
Dear Moses devs/users
@Ken, thanks for the link to the vocab extraction code and the explanation.
Building trie binaries takes a little long on a single core. Is there some
-threads options for build_binary?
Regards,
Liling
___
Moses-support mailing
Dear Moses devs/users,
@Marcin, thanks for the tip on the trie, I'll try out the trie.
About the 100 MERT iterations, when i tried to run mert-moses.pl on that
target language with 71GB of binarized language model on a 3000 line dev
set, it took more than one day to tune using 10 threads. Is
Dear Moses devs/users,
@Ken, MERT with 100 iterations might be an overkill but MERT with 20 also
wouldn't finished in a day on the 71GB binarzied LM. It was at the 5th
iteration when i killed it.
@Marcin, I'll try to retune with the trie LM and see how far it goes.
*Does the build_binary come
phrase-table/lexical-table, non-threaded processing/training/decoding.
Is there a guide on dealing with big models? How big can a model grow and
what is the proportional server clockspeed/RAM necessary?
Regards,
Liling
On Tue, Apr 21, 2015 at 6:39 PM, liling tan alvati...@gmail.com wrote:
Dear
Dear Moses devs/users,
*How should one work with big models?*
Originally, I've 4.5 million parallel sentences and ~13 million sentences
monolingual data for source and target languages.
After cleaning with
https://github.com/alvations/mosesdecoder/blob/master/scripts/other/gacha_filter.py
and
Dear Moses dev/users,
Has anyone tried to build a language model from 16 GB of texts?
What does Last input should have been poison. mean?
Does anyone know how to estimate the output size of the language model file
given 16GB of texts with 8 grams? How about 5grams, how big will it get?
We've
Dear Moses devs/users,
Is there any working script for training a moses model with NPLM?
Is there any working train-model script for training a MT model with
bilingual LM?
http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc35
Is there a tutorial on Vector/Neural MT for
Dear Alexander,
We're glad to sell you phrase tables at a reasonable price. Do you have a
budget of how much the project can/is willing to pay for the phrase tables?
Please provide us with details on the configurations you require for the
phrase tables.
As a bonus we will throw in a free kneser
- it requires the phrase table
(and often the reordering table), as specified in a moses.ini
configuration file.
So you will need those files.
-phi
On Mon, Aug 11, 2014 at 11:42 AM, liling tan alvati...@gmail.com wrote:
Dear Moses dev/users,
I have a lex.e2f and lex.f2e
Dear Moses dev/users,
I have a lex.e2f and lex.f2e and the language model files generated by
other software.
May I know how do I use only the moses decoder to produce machine
translation output?
How do I specify which lex file and language model file when using only the
moses decoder?
Which
-machine-translation
Regards,
Liling
P/S: Cam on anh nhieu.
On Mon, Aug 4, 2014 at 11:38 AM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:
hi liling
On 3 August 2014 22:02, liling tan alvati...@gmail.com wrote:
Dear Moses community,
I have reimplemented the phrasal extraction algorithm
Dear Moses community,
I have reimplemented the phrasal extraction algorithm as presented on the
page 133 of Philip Koehn's SMT book for NLTK in
https://github.com/alvations/nltk/blob/develop/nltk/align/phrase_based.py
However, there is some bug that i can't figure out why am I not achieving
the
71 matches
Mail list logo