hi Fong
if anyone's still having memory problems with sntcooc, I've created an
alternative version which uses virtually minimal memory but instead uses
unix sort/uniq to create the sparse matrix needed by giza/mgiza.
It's committed to the mgiza repos.
Dear all:
What is snt2cooc.out used to do in training of Moses?
Thanks!
Best Regards,
Fong Pui Chi___
Moses-support mailing list
snt2cooc.out is a component of GIZA++. Moses' train-model.perl script
uses snt2cooc.out in step 2 to create co-occurrence files.
snt2cooc is
a component of MGIZA++. It does the same thing as snt2cooc.out, but it
has different command line arguments.
You can use snt2cooc with
train-model.perl
Hi,
GIZA++ can run more memory-efficient when it knows in advance
which words can possibly align to which other words (because they
occur in the same sentence pair). Hence, snt2cooc is collecting a list
of words that may co-occur prior to running GIZA++.
Since snt2cooc can also run into memory
can sntcooc be made more memory efficient?
for a start, it seems to be designed to do a calc of each
source,target) word pair but the count is not used.
it seems a bit silly that the simple initializer for giza++ takes up way
more memory than giza++ itself.
On 08/04/2012 04:21, Philipp
Hi,
I do not know details of the code, but that snt2cooc takes up
more memory than GIZA++ is the point of it. Once the sparse
matrix of word translations is known, GIZA++ can use a more
memory efficient data structure for it. snt2cooc can't do that
(it's its purpose to find out the identity of