Re: [Senseclusters-users] svdpackout.pl (fwd)

ted pedersen Tue, 02 Aug 2005 10:15:18 -0700

Hi Sundaze,

Thanks very much for your answers. It appears to me that you are doing
everything correctly, or at least there are no obvious problems. I am,
however, in Romania right now, and don't have access to my normal systems
so it will be next week before I can fully investigate your situation.
Sorry about that...a few more notes below...


> I'm going to briefly answer the questions of your previous mail.
>
> > 1) What settings have you used in las2.h for NMAX, NZMAX, and LMTNW?
> LMTNW   900300001
> NMAX    30000
> NZMAX   9000000
>

Looks good.

>
>
> > 2) Have you set the precision option in SenseClusters? If so, to what
> value?
> No, i have used the default
>

That's fine, 4 should be more than enough for precision in most cases.

> > 3) How are you creating the input to las2? If you used mat2harbo.pl, what
> > command line options did you use? If you didn't use mat2harbo, that's very
> > likely the problem. (If you used the wrapper (discriminate.pl) then
> > you used mat2harbo automatically - perhaps you could then send me the
> > options you used with discriminate.pl)
> I'm not using the wrapper so i used
> mat2harbo.pl -param -k=300 asso.co_occur > asso.HB

OK, that looks good.

>
> What i did notice, is that when the default precision is used, a matrix
> of about 6000x6000 does amount to a whopping 550Mb. I didn't realise
> that. Perhaps there's a good trade of between precision and storage
> requirements?

This is where my lack of access to a system puts me at disadvantage. I'm
not sure how much memory I would expect to be used in this case.

>
> Nevertheless, At the current rate (I'm using an AMD Athlon 2Ghz with 1Gb
> of RAM on an Ubuntu distribution of linux), it's taking days for what I
> consider to be a small matrix.
> Is this normal?

No, it's not. I routinely run matrices of these size in much less time, so
I'm not quite sure I understand what is happening here.


> If you'd like me to, i can send you my text file (1.7Mb or 473Kb
> compressed) and my primitive wrapper I wrote. The preprocessing etc.
> takes a few minutes, but it's svdpackout.pl where it all goes wrong.

Yes, that would be nice. If you could include asso.HB and lao2, that
would be helpful.

Also, just curious, what are you doing that can't be done with the wrapper
we provide? We are always looking for ways to make the wrapper we provide
work for as many users as possible, so please let us know what we might
have missed.

> Btw, i also wondered about something else: when i use count.pl to remove
> bigrams that occur say less than 2, how should one go about when the
> word order of the bigrams doesn't matter?

You can use co-occurrence features instead! Those are unordered bigrams,
and should be exactly what you need. Check out how we do that in discriminate.pl
in order to see how you can that yourself. Also, if it can be done, you
might want to run your data through the standard wrapper and see if you
still have problems. That would be good to know, even if the wrapper doesn't
do exactly what you need.

Thanks!
Ted




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] svdpackout.pl (fwd)

Reply via email to