We are pleased at announce the release of SenseClusters version 0.95.
SenseClusters is a freely available package that allows you to cluster
similar contexts, or to cluster words that occur in similar contexts.
It is fully unsupervised, and can automatically discover the optimal
number of clusters in your text.
As of version 0.95, we now fully support Latent Semantic Analysis for
context and word clustering, and we continue to improve the native
SenseClusters methods, which includes the ability to cluster first and
second order representations of context.
SenseClusters can be downloaded from :
http://senseclusters.sourceforge.net/
You can also try out SenseClusters via our web interface:
http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi
In both native and LSA modes, SenseClusters relies on lexical features
(such as unigrams, bigrams, and co--occurrences) that can be identified in
raw text. The tokenization is very flexible - a user can define this via
Perl regular expressions - so it is possible to work with many other
languages besides English, and you can easily work with tokenization
schemes other than white-space separated words, such as character based
tokens, like 2 letter sequences, etc.
The native SenseClusters methods support traditional first order context
clustering, where you identify a feature set, and then determine which of
those features occur in the contexts you are clustering. The native
methods also support second order context clustering, where each word
is represented by a vector of the words with which it co-occurs.
All the words in a context to be clustered are replaced by their
associated vectors, and these vectors are averaged together to represent
that context. Note that you can also cluster the word vectors to identify
sets of related words.
Latent Semantic Analysis differs from the native SenseClusters methods in
that each feature is represented by a vector that shows the contexts in
which that feature occurs. Then, all the features in a context to be
clustered are replaced by their associated vectors, and these are
averaged together to represent the context. Note that you can also
cluster the feature vectors directly to identify sets of related features.
This release represents a major step forward in the functionality of
SenseClusters. Much of work in providing LSA support was carried out by
Mahesh Joshi this past spring and summer. And has always been the case
over the last two years, Anagha Kulkarni played a large role in this
release, and she has included many improvements to automated cluster
stopping and other areas in 0.95.
Please give this a try, and let us know if you have any comments or
questions! If you aren't certain if your problem can be approached using
SenseClusters, please let us know what you would like to do and maybe we
can help you get started.
Cordially,
Ted, Anagha, and Mahesh
====================================================================
ChangeLog:
http://www.d.umn.edu/~tpederse/Code/Changelog.SenseClusters-v0.95.txt
Installation Instructions:
http://www.d.umn.edu/~tpederse/Code/SenseClusters-v0.95-INSTALL.txt
Related Publications (includes links to data you can use):
http://www.d.umn.edu/~tpederse/senseclusters-pubs.html
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users