Sorry to be back so soon, but today has been my "fighting with 
SenseClusters day"...

OK, after reading and re-reading the documentation, I finally concluded 
that working with SCbut  avoiding NSP and SVAL is virtually impossible, so 
I gavee up trying that...

Then, in the perspective of letting my data be re-counted by NSP and 
represented in SVAL format, I have two questions.

First: Does anybody know how well this solution scales up? I need to 
extract counts from corpora of up to 2 billion tokens: is it realistic to 
let NSP count them?

Second: I would like to use some sort of structured syntactic information 
when counting bigrams.

E.g., suppose I want to cluster nouns. Rather than considering their 
co-occurrence with everything within a fixed size window, I would like to 
count their co-occurrences with, say, any A in their noun phrase, any V 
they are the object of, and any V they are the subject of.

For example, from the sentence:

The fast cat with the long black tail ate the poor mouse

I would like to extract the following bigrams, as far as "cat" is concerned:

fast cat
cat ate

and, for "mouse",

ate mouse
poor mouse

but not, for example, cat black, cat tail, tail mouse, cat mouse, etc.

I have a rudimentary partial parser that allows me to extract the contexts 
I want. My question is: how can I feed them to SC?

I thought of generating, from the above, a representation like:

fast <head>cat</head> ate
ate poor <head>mouse</head>

However, if I use statistical association measures instead of raw 
frequencies, I don't know how to "tell" the system that the marginals to be 
considered should be different for, say, "poor mouse" (counts of A, N and 
AN in all AN sequences) and "ate mouse" (counts of V, N and VN in all VN 
sequences).

Am I on the right track with my representation above? Is there a solution 
to the "different marginals" problem?

Any hint appreciated -- thanks in advance.

Regards,

Marco




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to