Owen Densmore wrote:
I would like to be able to analyze my document collection (~1200
documents) and discover good buckets of categories for them. I'm
pretty sure this is termed Document Clustering .. finding the emergent
clumps the documents fall naturally into judging from their term vectors
Hi Owen,
Last year it was suggested Carrot2 could help, and it would even produce
good labels for the clusters. Has this proven to be true?
Yes, Carrot2 should help you with this. The labels it creates highly
depend on the quality of the input snippets, but the so-called KWIK
snippets
I would like to be able to analyze my document collection (~1200
documents) and discover good buckets of categories for them. I'm
pretty sure this is termed Document Clustering .. finding the emergent
clumps the documents fall naturally into judging from their term
vectors.
Looking
I was basically thinking of using lucene to generate document
vectors, and writing my custom similarity algorithms for measuring
distance.
I could then run this data through k-means or SOM algorithms for
calculating clusters
First of all, I think it would already be great if there was some
Hi,
does anyone have any sample code/documentation available for doing document based
clustering using lucene?
Thanks,
Marc
Hi Marc,
I'm working on it. Classification and Clustering as well.
I was planing doing it for nutch.org, but actually some guys there
breakup some important basic work I already had done, so may be i will
not contribute it there.
However it will be open source and I can notice you if something
I'm working on it. Classification and Clustering as well.
Very interesting... if you get something working, please don't forget to
notify this list :-)
--
Eric Jain
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
Hi
As everybody seems to be so exited about it, would someone please be so kind to
explain
what document based clustering is?
Regards,
Marcel
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
Marcel Stör wrote:
Hi
As everybody seems to be so exited about it, would someone please be so kind to explain
what document based clustering is?
Hi
they are trying to implement what you can see in the right panel here:
http://www.egothor.dundee.ac.uk/egothor/q2c.jsp?q=protein
They may also
--- Leo Galambos [EMAIL PROTECTED] wrote:
Marcel Stör wrote:
Hi
As everybody seems to be so exited about it, would someone please be
so kind to explain
what document based clustering is?
AFAIK, document clustering consists of detection of documents with
similar content (similar
of document clustering, but here you go anyway :)
Here is an illustration:
Patterns in Unstructured Data
Discovery, Aggregation, and Visualization
http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm
Cheers,
PA
Message-
From: petite_abeille [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 10:50 AM
To: Lucene Users List
Subject: Re: Document Clustering
Hi Otis,
On Nov 11, 2003, at 16:41, Otis Gospodnetic wrote:
How is document clustering different/related to text categorization
On Nov 11, 2003, at 16:58, Tate Avery wrote:
Categorization typically assigns documents to a node in a pre-defined
taxonomy.
For clustering, however, the categorization 'structure' is emergent...
i.e. the clusters (which are analogous to taxonomy nodes) are created
dynamically based on the
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match in it.
You group all documents with minimal distance together.
Classification: you have already categories and samples for it, that help you to match
Thanks for the clarification, Stefan. I should have known that... :)
Otis
--- Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match
in it.
You group all
Stefan Groschupf wrote:
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match
in it. You group all documents with minimal distance together.
Would I be correct to say that you have to define a distance
On Tuesday, Nov 11, 2003, at 11:05 US/Pacific, Marcel Stor wrote:
Stefan Groschupf wrote:
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match
in it. You group all documents with minimal distance together
Marcel Stor wrote:
Stefan Groschupf wrote:
Hi,
How is document clustering different/related to text categorization?
Clustering: try to find own categories and put documents that match
in it. You group all documents with minimal distance together.
Would I be correct to say
, what are you building?? Maybe we can help!
Kind regards,
Maurits
- Original Message -
From: marc [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 5:15 PM
Subject: Document Clustering
Hi,
does anyone have any sample code/documentation available
hope this helps...
Marc, what are you building?? Maybe we can help!
Kind regards,
Maurits
- Original Message -
From: marc [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 5:15 PM
Subject: Document Clustering
Hi,
does anyone have any sample code
On Nov 11, 2003, at 21:32, maurits van wijland wrote:
There is the carrot project :
http://www.cs.put.poznan.pl/dweiss/carrot/
Leo Galambos, author of the Egothor project, constantly supports us
with fresh ideas and includes Carrot components in his own project!
Hi!
I'm also interest it. Kindly CC to me the lastest progress of your
clustering project.
Regards,
AlexAw
- Original Message -
From: Eric Jain [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 10:07 PM
Subject: Re: Document Clustering
I'm
this sound like i'm on the right track...i'm still just in the
*thinking* stage.
Marc
- Original Message -
From: Alex Aw Seat Kiong [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 5:47 PM
Subject: Re: Document Clustering
Hi!
I'm also interest
23 matches
Mail list logo