Re: Document Clustering

2005-02-08 Thread David Spencer
Owen Densmore wrote: I would like to be able to analyze my document collection (~1200 documents) and discover good buckets of categories for them. I'm pretty sure this is termed Document Clustering .. finding the emergent clumps the documents fall naturally into judging from their term vectors

Re: Document Clustering

2005-02-08 Thread Dawid Weiss
Hi Owen, Last year it was suggested Carrot2 could help, and it would even produce good labels for the clusters. Has this proven to be true? Yes, Carrot2 should help you with this. The labels it creates highly depend on the quality of the input snippets, but the so-called KWIK snippets

Re: Document Clustering

2005-02-07 Thread Owen Densmore
I would like to be able to analyze my document collection (~1200 documents) and discover good buckets of categories for them. I'm pretty sure this is termed Document Clustering .. finding the emergent clumps the documents fall naturally into judging from their term vectors. Looking

Re: Document Clustering

2003-11-12 Thread Eric Jain
I was basically thinking of using lucene to generate document vectors, and writing my custom similarity algorithms for measuring distance. I could then run this data through k-means or SOM algorithms for calculating clusters First of all, I think it would already be great if there was some

Document Clustering

2003-11-11 Thread marc
Hi, does anyone have any sample code/documentation available for doing document based clustering using lucene? Thanks, Marc

Re: Document Clustering

2003-11-11 Thread Stefan Groschupf
Hi Marc, I'm working on it. Classification and Clustering as well. I was planing doing it for nutch.org, but actually some guys there breakup some important basic work I already had done, so may be i will not contribute it there. However it will be open source and I can notice you if something

Re: Document Clustering

2003-11-11 Thread Eric Jain
I'm working on it. Classification and Clustering as well. Very interesting... if you get something working, please don't forget to notify this list :-) -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: Document Clustering

2003-11-11 Thread Marcel Stör
Hi As everybody seems to be so exited about it, would someone please be so kind to explain what document based clustering is? Regards, Marcel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Document Clustering

2003-11-11 Thread Leo Galambos
Marcel Stör wrote: Hi As everybody seems to be so exited about it, would someone please be so kind to explain what document based clustering is? Hi they are trying to implement what you can see in the right panel here: http://www.egothor.dundee.ac.uk/egothor/q2c.jsp?q=protein They may also

Re: Document Clustering

2003-11-11 Thread Otis Gospodnetic
--- Leo Galambos [EMAIL PROTECTED] wrote: Marcel Stör wrote: Hi As everybody seems to be so exited about it, would someone please be so kind to explain what document based clustering is? AFAIK, document clustering consists of detection of documents with similar content (similar

Re: Document Clustering

2003-11-11 Thread petite_abeille
of document clustering, but here you go anyway :) Here is an illustration: Patterns in Unstructured Data Discovery, Aggregation, and Visualization http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm Cheers, PA

RE: Document Clustering

2003-11-11 Thread Tate Avery
Message- From: petite_abeille [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 10:50 AM To: Lucene Users List Subject: Re: Document Clustering Hi Otis, On Nov 11, 2003, at 16:41, Otis Gospodnetic wrote: How is document clustering different/related to text categorization

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 16:58, Tate Avery wrote: Categorization typically assigns documents to a node in a pre-defined taxonomy. For clustering, however, the categorization 'structure' is emergent... i.e. the clusters (which are analogous to taxonomy nodes) are created dynamically based on the

Re: Document Clustering

2003-11-11 Thread Stefan Groschupf
Hi, How is document clustering different/related to text categorization? Clustering: try to find own categories and put documents that match in it. You group all documents with minimal distance together. Classification: you have already categories and samples for it, that help you to match

Re: Document Clustering

2003-11-11 Thread Otis Gospodnetic
Thanks for the clarification, Stefan. I should have known that... :) Otis --- Stefan Groschupf [EMAIL PROTECTED] wrote: Hi, How is document clustering different/related to text categorization? Clustering: try to find own categories and put documents that match in it. You group all

RE: Document Clustering

2003-11-11 Thread Marcel Stor
Stefan Groschupf wrote: Hi, How is document clustering different/related to text categorization? Clustering: try to find own categories and put documents that match in it. You group all documents with minimal distance together. Would I be correct to say that you have to define a distance

Re: Document Clustering

2003-11-11 Thread Joshua O'Madadhain
On Tuesday, Nov 11, 2003, at 11:05 US/Pacific, Marcel Stor wrote: Stefan Groschupf wrote: Hi, How is document clustering different/related to text categorization? Clustering: try to find own categories and put documents that match in it. You group all documents with minimal distance together

Re: Document Clustering

2003-11-11 Thread Stefan Groschupf
Marcel Stor wrote: Stefan Groschupf wrote: Hi, How is document clustering different/related to text categorization? Clustering: try to find own categories and put documents that match in it. You group all documents with minimal distance together. Would I be correct to say

Re: Document Clustering

2003-11-11 Thread maurits van wijland
, what are you building?? Maybe we can help! Kind regards, Maurits - Original Message - From: marc [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 5:15 PM Subject: Document Clustering Hi, does anyone have any sample code/documentation available

Re: Document Clustering

2003-11-11 Thread Stefan Groschupf
hope this helps... Marc, what are you building?? Maybe we can help! Kind regards, Maurits - Original Message - From: marc [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 5:15 PM Subject: Document Clustering Hi, does anyone have any sample code

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 21:32, maurits van wijland wrote: There is the carrot project : http://www.cs.put.poznan.pl/dweiss/carrot/ Leo Galambos, author of the Egothor project, constantly supports us with fresh ideas and includes Carrot components in his own project!

Re: Document Clustering

2003-11-11 Thread Alex Aw Seat Kiong
Hi! I'm also interest it. Kindly CC to me the lastest progress of your clustering project. Regards, AlexAw - Original Message - From: Eric Jain [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 10:07 PM Subject: Re: Document Clustering I'm

Re: Document Clustering

2003-11-11 Thread marc
this sound like i'm on the right track...i'm still just in the *thinking* stage. Marc - Original Message - From: Alex Aw Seat Kiong [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 5:47 PM Subject: Re: Document Clustering Hi! I'm also interest