For reference to the list.

-------- Original Message --------
Subject: Re: [Moses-support] Is there multithread option for KenLM's build_binary?
Date:   Fri, 07 Aug 2015 22:02:36 +0200
From:   Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
To:     liling tan <alvati...@gmail.com>



Hi Liling,
There is a switch e.g. "-S 30G" with which you can control memory usage. I actually recommend setting it to something smaller than 50G, otherwise memory mapping sometimes goes crazy and the process stalls (OS bug?). Cannot tell you much about the time needed. I think I binarized a 250G gzipped ARPA on a similar machine in less than 24 hours. Sometimes the process will slow down a lot, that will be on-disk sorting of ngrams before crunching them into a trie. I doubt multi-threading would help much.

Quantization is lossy. In practice however you may not notice any difference in BLEU. I am using "-a 22 -q 8 -b 8" and I haven't seen any quality loss. I think there were increases in perplexity, but translation quality was not affected.

Best,
Marcin

On 07.08.2015 21:31, liling tan wrote:
Dear Moses dev/users,

On a related note, without multi-threads, can anyone give a gauge of how much RAM is required to binarized a 80GB (compressed .gz) 6gram arpa file? The no. of ngrams are:

    \data\
    ngram 1=7503209
    ngram 2=131003943
    ngram 3=671005861
    ngram 4=1510529519
    ngram 5=2165163610
    ngram 6=2477533666


Also, how long would it take (single-threadedly) on a 2.4Ghz core with 128GB RAM? Is there a way to mathematically estimate the time taken and RAM required to binarize a language model?

Also, is binarized and quantized LM from KenLM lossy? If so how lossy? The KenLM paper states "To conserve memory at the expense of accuracy, values may be quantized using q bits per probability and r bits per backoff". Can someone help point us to papers that quanitfy how lossy it gets in terms of MT experiments or word perplexity task?

Thanks in advance for the pointers!

Regards,
Liling

On Fri, Aug 7, 2015 at 8:56 PM, liling tan <alvati...@gmail.com <mailto:alvati...@gmail.com>> wrote:

    Dear Moses dev/users,

    Is there multithread option for KenLM's build_binary?

    Regards,
    Liling




_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to