[Senseclusters-users] sense cluster questions

Javier Sanchez Monzon (Tino) Thu, 07 May 2009 04:55:39 -0700

Hi everybody, 

-i have some questions refering to senseclusters tools.  I hope there are not 
so many.


0-maybe the main question is the following:

-Is it possible to have an output like the following?  I don't matter from 
which documents the words come from. I am only intersting in how the words are 
related before the clustering and after it.
is this Solution possible?
 cluster0
----------
word--(0.82)--word2
word--(0.81)--word3
word3--(0.72)--word2
.....

cluster1
----------
word4--(0.82)--word9
word6--(0.81)--word5
word37--(0.72)--word6
......

1-I look forward to determine scored relations between nouns and proper names 
with the sense cluster tools.  I achieved this by using count.pl, combig.pl and 
then statistics.pl.  I tested for the last only with the default association 
measure: Maximum Likelihood ratio.  Is the Fisher measure better in the case i 
am intersting infinding best co occurrences of the text corpus? This solution 
is without clustering process. 


2-i did some experiments with count.pl, combig.pl statistics.pl(Log likelihood 
ratio), wordvec.pl and vcluster(given a num of clusters) programs.  With the 
report of clustering i ask to add the frequent item sets of each cluster.  
How is this calculation of frequent itemsets done?  Are these words the most 
often words of the cluster that appear together in the documents before 
clustering?


3-About Describing and Descrimnating features.  Let's say i ask for the best 5 
features for each cluster. 
cluster 1
-----------
Describing features(features that can appear on other clusters?): tv 40% 
magazin30% show 29% stage 27% crowd 25%
Discriminate features:(this features only appears in this cluster?)  
...............

Is it possible then here to infer that tv and magazin and have someting like:
word--(0.82)--word2
word--(0.81)--word3
word3--(0.72)--word2

4-i understood that using count.pl, combig.pl, statstics.pl, wordvec.pl, 
vcluster give a hard-clustering solution. Which other combination or setups i 
should try in order to obtain a soft clustering solution?  For example to 
having some words repeated in more than one cluster?  Consider for example 
follwiing solution:

cluster 0
-----------
word1 word 2 word3
cluster 1
-----------
word1 word4 word2

How can i achieve this?  Does scluster would do this?

5-when i use the clusterstopping.pl program it suggest in the most cases (using 
the default stop measure pk3) to my opinion a little number of clusters.  When 
i cluster with a number that is 2times grater than the suggested i get like 
expected more precisely cluster repartition.  My question here is: with which 
other stop clustering measure i should try with?


regards, 
Tino
ps: Congratulations to Dr. Ted Pedersen for his promotion as associated 
professor.



      

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

[Senseclusters-users] sense cluster questions

Reply via email to