Hi everybody,
-i have some questions refering to senseclusters tools. I hope there are not
so many.
0-maybe the main question is the following:
-Is it possible to have an output like the following? I don't matter from
which documents the words come from. I am only intersting in how the words are
related before the clustering and after it.
is this Solution possible?
cluster0
----------
word--(0.82)--word2
word--(0.81)--word3
word3--(0.72)--word2
.....
cluster1
----------
word4--(0.82)--word9
word6--(0.81)--word5
word37--(0.72)--word6
......
1-I look forward to determine scored relations between nouns and proper names
with the sense cluster tools. I achieved this by using count.pl, combig.pl and
then statistics.pl. I tested for the last only with the default association
measure: Maximum Likelihood ratio. Is the Fisher measure better in the case i
am intersting infinding best co occurrences of the text corpus? This solution
is without clustering process.
2-i did some experiments with count.pl, combig.pl statistics.pl(Log likelihood
ratio), wordvec.pl and vcluster(given a num of clusters) programs. With the
report of clustering i ask to add the frequent item sets of each cluster.
How is this calculation of frequent itemsets done? Are these words the most
often words of the cluster that appear together in the documents before
clustering?
3-About Describing and Descrimnating features. Let's say i ask for the best 5
features for each cluster.
cluster 1
-----------
Describing features(features that can appear on other clusters?): tv 40%
magazin30% show 29% stage 27% crowd 25%
Discriminate features:(this features only appears in this cluster?)
...............
Is it possible then here to infer that tv and magazin and have someting like:
word--(0.82)--word2
word--(0.81)--word3
word3--(0.72)--word2
4-i understood that using count.pl, combig.pl, statstics.pl, wordvec.pl,
vcluster give a hard-clustering solution. Which other combination or setups i
should try in order to obtain a soft clustering solution? For example to
having some words repeated in more than one cluster? Consider for example
follwiing solution:
cluster 0
-----------
word1 word 2 word3
cluster 1
-----------
word1 word4 word2
How can i achieve this? Does scluster would do this?
5-when i use the clusterstopping.pl program it suggest in the most cases (using
the default stop measure pk3) to my opinion a little number of clusters. When
i cluster with a number that is 2times grater than the suggested i get like
expected more precisely cluster repartition. My question here is: with which
other stop clustering measure i should try with?
regards,
Tino
ps: Congratulations to Dr. Ted Pedersen for his promotion as associated
professor.
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users