Re: Term Weights and Clustering

2005-02-24 Thread Dawid Weiss
Hi Owen, I'm from the Carrot2 project, so I feel called to the blackboard: One source for how to do this is the thesis of Stanislaw Osinski and others like it: http://www.dcs.shef.ac.uk/teaching/eproj/msc2004/abs/m3so.htm And the Carrot2 project which uses similar techniques. http://www.cs

Re: Term Weights and Clustering

2005-02-23 Thread David Spencer
I'm a little confused on exactly, exactly what you want but if your goal is to cluster your papers w/ carrot2 then I found these links helpful: http://www.newsarch.com/archive/mailinglist/jakarta/lucene/user/msg03928.html http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Only caveat is I

Term Weights and Clustering

2005-02-23 Thread Owen Densmore
I'm building a TDM (Term Document Matrix) from my lucene index. As part of this, it would be useful to have the document term weights (the TF*IDF-weight) if they are already available. Naturally I can compute them, but I suspect they are lurking behind an API I've not discovered yet. Is ther