Re: [Moses-support] What is snt2cooc used to do in training?

2012-05-28 Thread Hieu Hoang
hi Fong if anyone's still having memory problems with sntcooc, I've created an alternative version which uses virtually minimal memory but instead uses unix sort/uniq to create the sparse matrix needed by giza/mgiza. It's committed to the mgiza repos.

[Moses-support] What is snt2cooc used to do in training?

2012-04-07 Thread Fong Po Po
Dear all: What is snt2cooc.out used to do in training of Moses? Thanks! Best Regards,     Fong Pui Chi___ Moses-support mailing list

Re: [Moses-support] What is snt2cooc used to do in training?

2012-04-07 Thread Tom Hoar
snt2cooc.out is a component of GIZA++. Moses' train-model.perl script uses snt2cooc.out in step 2 to create co-occurrence files. snt2cooc is a component of MGIZA++. It does the same thing as snt2cooc.out, but it has different command line arguments. You can use snt2cooc with train-model.perl

Re: [Moses-support] What is snt2cooc used to do in training?

2012-04-07 Thread Philipp Koehn
Hi, GIZA++ can run more memory-efficient when it knows in advance which words can possibly align to which other words (because they occur in the same sentence pair). Hence, snt2cooc is collecting a list of words that may co-occur prior to running GIZA++. Since snt2cooc can also run into memory

Re: [Moses-support] What is snt2cooc used to do in training?

2012-04-07 Thread Hieu Hoang
can sntcooc be made more memory efficient? for a start, it seems to be designed to do a calc of each source,target) word pair but the count is not used. it seems a bit silly that the simple initializer for giza++ takes up way more memory than giza++ itself. On 08/04/2012 04:21, Philipp

Re: [Moses-support] What is snt2cooc used to do in training?

2012-04-07 Thread Philipp Koehn
Hi, I do not know details of the code, but that snt2cooc takes up more memory than GIZA++ is the point of it. Once the sparse matrix of word translations is known, GIZA++ can use a more memory efficient data structure for it. snt2cooc can't do that (it's its purpose to find out the identity of