[Senseclusters-users] SenseClusters version 0.95 released

ted pedersen Sat, 26 Aug 2006 10:49:02 -0700

We are pleased at announce the release of SenseClusters version 0.95.   

SenseClusters is a freely available package that allows you to cluster   
similar contexts, or to cluster words that occur in similar contexts. 
It is fully unsupervised, and can automatically discover the optimal 
number of clusters in your text.


As of version 0.95, we now fully support Latent Semantic Analysis for      
context and word clustering, and we continue to improve the native   
SenseClusters methods, which includes the ability to cluster first and  
second order representations of context.

SenseClusters can be downloaded from :

        http://senseclusters.sourceforge.net/

You can also try out SenseClusters via our web interface:

        http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi

In both native and LSA modes, SenseClusters relies on lexical features   
(such as unigrams, bigrams, and co--occurrences) that can be identified in    
raw text. The tokenization is very flexible - a user can define this via    
Perl regular expressions - so it is possible to work with many other     
languages besides English, and you can easily work with tokenization  
schemes other than white-space separated words, such as character based 
tokens, like 2 letter sequences, etc.

The native SenseClusters methods support traditional first order context    
clustering, where you identify a feature set, and then determine which of  
those features occur in the contexts you are clustering. The native   
methods also support second order context clustering, where each word 
is represented by a vector of the words with which it co-occurs. 
All the words in a context to be clustered are replaced by their 
associated vectors, and these vectors are averaged together to represent 
that context. Note that you can also cluster the word vectors to identify 
sets of related words. 

Latent Semantic Analysis differs from the native SenseClusters methods in  
that each feature is represented by a vector that shows the contexts in  
which that feature occurs. Then, all the features in a context to be   
clustered are replaced by their associated vectors, and these are  
averaged together to represent the context. Note that you can also  
cluster the feature vectors directly to identify sets of related features. 

This release represents a major step forward in the functionality of    
SenseClusters. Much of work in providing LSA support was carried out by  
Mahesh Joshi this past spring and summer. And has always been the case 
over the last two years, Anagha Kulkarni played a large role in this     
release, and she has included many improvements to automated cluster 
stopping and other areas in 0.95.

Please give this a try, and let us know if you have any comments or 
questions! If you aren't certain if your problem can be approached using 
SenseClusters, please let us know what you would like to do and maybe we 
can help you get started. 

Cordially,
Ted, Anagha, and Mahesh

====================================================================

ChangeLog:
http://www.d.umn.edu/~tpederse/Code/Changelog.SenseClusters-v0.95.txt

Installation Instructions: 
http://www.d.umn.edu/~tpederse/Code/SenseClusters-v0.95-INSTALL.txt

Related Publications (includes links to data you can use):
http://www.d.umn.edu/~tpederse/senseclusters-pubs.html

--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

[Senseclusters-users] SenseClusters version 0.95 released

Reply via email to