Hi Owen,
I'm from the Carrot2 project, so I feel called to the blackboard:
One source for how to do this is the thesis of Stanislaw Osinski and
others like it:
http://www.dcs.shef.ac.uk/teaching/eproj/msc2004/abs/m3so.htm
And the Carrot2 project which uses similar techniques.
http://www.cs
I'm a little confused on exactly, exactly what you want but if your goal
is to cluster your papers w/ carrot2 then I found these links helpful:
http://www.newsarch.com/archive/mailinglist/jakarta/lucene/user/msg03928.html
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip
Only caveat is I
I'm building a TDM (Term Document Matrix) from my lucene index. As
part of this, it would be useful to have the document term weights (the
TF*IDF-weight) if they are already available. Naturally I can compute
them, but I suspect they are lurking behind an API I've not discovered
yet. Is ther