Hi Owen,
I'm from the Carrot2 project, so I feel called to the blackboard:
One source for how to do this is the thesis of Stanislaw Osinski and
others like it:
http://www.dcs.shef.ac.uk/teaching/eproj/msc2004/abs/m3so.htm
And the Carrot2 project which uses similar techniques.
Hi Owen,
Last year it was suggested Carrot2 could help, and it would even produce
good labels for the clusters. Has this proven to be true?
Yes, Carrot2 should help you with this. The labels it creates highly
depend on the quality of the input snippets, but the so-called KWIK
snippets
Karl,
Two things, try to experiment with both:
1) I would try to write a lexical scanner that strips HTML tags, much
like the regular expression does. Java lexical scanner packages produce
nice pure Java classes that seldom use any advanced API, so they should
work on Java 1.1. They are simple
Hi.
Coming up with answers... a little belated, but hope you're still on:
we have been experimenting with carrot2 and are very pleased so far,
only one issue: there is no release not even an alpha one and the
dependencies seemed to be patched (jama)
Yes, there is not official release. We just
Hi David,
I apologize about the delay in answering this one, Lucene is a busy
mailing list and I had a hectic last week... Again, sorry for belated
answer, hope you still find it useful.
That is awesome and very inspirational!
Yes, I admit what you've done with Wikipedia is quite interesting
It is quite interesting, Erik, thanks for the link. I'm sure you're
aware of the post-search clustering addon to Nutch that is based on the
project I'm heading -- Carrot2. If you have any ideas of how this could
be made better, I'm always open to suggestions.
Regards,
Dawid
nothing to do with each other furthermore, Arabic uses phonetic
indicators on each letter called diacritics that change the way you
pronounce the word which in turn changes the words meaning so two word
spelled exactly the same way with different diacritics will mean two
separate things,
Hi William,
Ok, here is some demo code I've put together that shows how you can
achieve clustering of Lucene's results. I hope this will get you started
on your projects. If you have questions, please don't hesitate to ask --
cross posts to carrot2-developers would be a good idea too.
The code
No problem. Let people know if it worked for you -- I look forward to
hearing your experiences (good or bad).
Dawid
William W wrote:
Thanks Dawid ! :)
From: Dawid Weiss [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Subject: Re
Nutch has a crawler. So does Egothor (the crawler is called Capek). If
you type web crawler in Google you'll get tons of projects.
Dawid
Zhang, Lisheng wrote:
Hi,
Does anyone know if there is free-software to crawl internet site
(webcrawler)? I know currently lucene does not have this feature
Dear all,
I saw a post about an attempt to integrate Carrot2 with Lucene. It was a
while ago, so I'm curious if any outcome has been achieved.
Anyway, as the project coordinator I can offer my help with such
integration; if you're looking for some ready-to-use code then there is
a clustering
to write the integration code with
Lucene. It is only a matter of writing a simple InputComponent instance
and this is really trivial (see Nutch's plugin code).
Dawid
William W wrote:
Hi Dawid,
I would like to use Carrot2 with lucene. Do you have examples ?
Thanks a lot,
William.
From: Dawid Weiss
.
Andrzej Bialecki wrote:
Dawid Weiss wrote:
Hi William,
No, I don't have examples because I never used Lucene directly. If you
provide me with a sample index and an API that executes a query on
this index (I need document titles, summaries, or snippets and an
anchor (identifier), can be an URL
just very shy by nature and doesn't talk much, hehe.
D.
William W wrote:
Hi Dawid,
The demos (under /src/demo) are very good. They have the basic usage
scenario.
Thanks Andrzej.
William.
Dawid Weiss wrote:
Hi William,
No, I don't have examples because I never used Lucene directly. If
you provide
14 matches
Mail list logo