I've observed this as well. It seems to me there are several competing
pressures affecting the number of ngram types in a corpus. On the one hand, as
the size of the corpus increases, so does the vocabulary. This obviously
increases the number of unigram types (which is the same as the
[ Excuses pour réceptions multiples ]
[ English Version below ]
Conférence TALN 2015 | RÉCITAL 2015
Appel à communications TALN 2015 | RÉCITAL 2015
22ème conférence sur le Traitement Automatique des Langues Naturelles
17èmes Rencontre des
Hi,
I just ran a count of different sized n-grams in the source side of my phrase
table and this is what I got.
unigrams 85,233
bigrams 991,701
trigrams 2,697,341
4-grams3,876,180
5-grams4,209,094
6-grams3,702,813
7-grams2,560,251
8-grams 0
Hi,
The data is sentence-segmented.
Assume you train your model with a training corpus which contains a
single parallel sentence pair. Your training sentence has length L on
both source and target side, and it's aligned along the diagonal.
If n L, you cannot extract any phrase of length n
We typically try to increase the tuning set in order to obtain more
reliable sparse feature weights. But in your case it's rather the test
set that seems a bit small for trusting the BLEU scores.
Do the sparse features give you any large improvement on the tuning set?
On Thu, 2015-01-15 at
On Thu, 2015-01-15 at 13:54 +0800, HOANG Cong Duy Vu wrote:
- tune test
(based on source)
size of overlap set = 624
(based on target)
size of overlap set = 386
(tune test have high overlapping parts based on source sentences,
but half of them have different target sentences)
Does
to align the words in the sentence, how should I do ?
Best regards
--
???.
-- next part --
An HTML attachment was scrubbed...
URL:
http://mailman.mit.edu/mailman/private/moses-support/attachments/20150115/9f
3850f8/attachment-0001.htm
Hello,
as far as I know, you can use the forced alignment process or the
incremental giza.
more info there:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc58
Cheers,
Christophe
2015-01-15 1:54 GMT+01:00 iamzcy_hit iamzcy_hit iamzcy...@gmail.com:
Hi,all
If I've train a