Re: [Moses-support] phrase table

2015-01-15 Thread John D Burger
I've observed this as well. It seems to me there are several competing pressures affecting the number of ngram types in a corpus. On the one hand, as the size of the corpus increases, so does the vocabulary. This obviously increases the number of unigram types (which is the same as the

[Moses-support] Dernier Appel à communications [Last CFP] : Conférence TALN 2015 | RÉCITAL 2015

2015-01-15 Thread Paul Martin
[ Excuses pour réceptions multiples ] [ English Version below ] Conférence TALN 2015 | RÉCITAL 2015 Appel à communications TALN 2015 | RÉCITAL 2015 22ème conférence sur le Traitement Automatique des Langues Naturelles 17èmes Rencontre des

[Moses-support] phrase table

2015-01-15 Thread Read, James C
Hi, I just ran a count of different sized n-grams in the source side of my phrase table and this is what I got. unigrams 85,233 bigrams 991,701 trigrams 2,697,341 4-grams3,876,180 5-grams4,209,094 6-grams3,702,813 7-grams2,560,251 8-grams 0

Re: [Moses-support] phrase table

2015-01-15 Thread Matthias Huck
Hi, The data is sentence-segmented. Assume you train your model with a training corpus which contains a single parallel sentence pair. Your training sentence has length L on both source and target side, and it's aligned along the diagonal. If n L, you cannot extract any phrase of length n

Re: [Moses-support] Sparse features and overfitting

2015-01-15 Thread Matthias Huck
We typically try to increase the tuning set in order to obtain more reliable sparse feature weights. But in your case it's rather the test set that seems a bit small for trusting the BLEU scores. Do the sparse features give you any large improvement on the tuning set? On Thu, 2015-01-15 at

Re: [Moses-support] Sparse features and overfitting

2015-01-15 Thread Matthias Huck
On Thu, 2015-01-15 at 13:54 +0800, HOANG Cong Duy Vu wrote: - tune test (based on source) size of overlap set = 624 (based on target) size of overlap set = 386 (tune test have high overlapping parts based on source sentences, but half of them have different target sentences) Does

Re: [Moses-support] Tokenization problem

2015-01-15 Thread Ihab Ramadan
to align the words in the sentence, how should I do ? Best regards -- ???. -- next part -- An HTML attachment was scrubbed... URL: http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/9f 3850f8/attachment-0001.htm

Re: [Moses-support] how to align some new parallel sentences using a trained model

2015-01-15 Thread Christophe Servan
Hello, as far as I know, you can use the forced alignment process or the incremental giza. more info there: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc58 Cheers, Christophe 2015-01-15 1:54 GMT+01:00 iamzcy_hit iamzcy_hit iamzcy...@gmail.com: Hi,all If I've train a