We have recently been running some larger than usual experiments for us,
and ran into a problem where SVD would essentially fail. We were running
experiments using a word matrix of about 25,000 x 25,000. It was a bigram
matrix, so wasn't symmetric. We would get errors with SVDPACKC, in
particular there was a message in the file lao2 saying

                        SORRY, YOUR MATRIX IS TOO BIG

Anyway, here's what we learned...First the key factor is knowing the
number of columns in your matrix. SVD is essentially trying to reduce the
number of columns down to the most significant k, where k is typically 300
for very large matrices. So the number of columns dicates how much work
SVD has to do, and of course tells about how much data you have.

So, in las2.h there are three constants that must be set. If you have a
25,000 x 25,000 matrix you have to set these values significantly higher
than even our recommended values, and certainly higher than the SVDPACKC
defaults.

We feel our max number of columns is 30,000, so we set NMAX to that value.
Now, we also need to set LMTNW, which allocates work space for SVD. This
is a little cryptic, but basically if you set this value to

(6*NMAX) + (4*NMAX) + 1 + NMAX*NMAX

you are ok. In our case this works out to a value of 900,300,001. So, if
this is bytes, which it just might be, that means we are asking for
approximately 900 MB, or call it 1 GB. The details of why the formula
above works are a bit obscure to me, but we are now able to run our 25,000
x 25,000 matrix through SVD, so that's good news.

Anyway, there is one more constant to set, and thats NZMAX, which
represents the number of non-zero values in your matrix. Now, a good rule
of thumb might be to set this to 1% of LMNTW, so that means we have NZMAX
set to 9,000,000. It turns out in our data we have 360,000 non zero
entries in our 25,000 x 25,000 matrix, so you can see it's pretty sparse.

In any case, if you find yourself working with 25,000 columns, consider
setting your parameters like this...

LMNTNW 900300001            (900 mb of work area)
NMAX   30000                (30,000 columns at a maximum)
NZMAX  9000000              (9,000,000 nonzero cells at a maximum)

If you don't know how many rows or columns or non-zero values your data
has, you can look on the first line of *presvd. That will show you the
number of rows, columns, and non-zero cells on the first line.

We hope this makes some sense! Working with SVDPACKC can at times be a
little confusing, but once you have set these parameters a few times it
gets much simpler. Also, consider figuring out what is the biggest
values that your machine could reasonably support, and simply set las2.h
to that. These represent maximum amounts of memory that SVDPACKC can
use, but if it needs less than that it will only use as much as it
needs. We have a 1GB RAM and a 2GB RAM machine, so the above settings
seem about right for the 1GB machine, but we still have room to let
the 2GB machine grow if we need to.

Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
senseclusters-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to