Re: Term Weights and Clustering

2005-02-24 Thread Dawid Weiss
Hi Owen, I'm from the Carrot2 project, so I feel called to the blackboard: One source for how to do this is the thesis of Stanislaw Osinski and others like it: http://www.dcs.shef.ac.uk/teaching/eproj/msc2004/abs/m3so.htm And the Carrot2 project which uses similar techniques.

Re: Document Clustering

2005-02-08 Thread Dawid Weiss
Hi Owen, Last year it was suggested Carrot2 could help, and it would even produce good labels for the clusters. Has this proven to be true? Yes, Carrot2 should help you with this. The labels it creates highly depend on the quality of the input snippets, but the so-called KWIK snippets

Re: which HTML parser is better?

2005-02-03 Thread Dawid Weiss
Karl, Two things, try to experiment with both: 1) I would try to write a lexical scanner that strips HTML tags, much like the regular expression does. Java lexical scanner packages produce nice pure Java classes that seldom use any advanced API, so they should work on Java 1.1. They are simple

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-31 Thread Dawid Weiss
Hi. Coming up with answers... a little belated, but hope you're still on: we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) Yes, there is not official release. We just

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-17 Thread Dawid Weiss
Hi David, I apologize about the delay in answering this one, Lucene is a busy mailing list and I had a hectic last week... Again, sorry for belated answer, hope you still find it useful. That is awesome and very inspirational! Yes, I admit what you've done with Wikipedia is quite interesting

Re: GUUUI - The optimal layout of search result pages

2004-10-11 Thread Dawid Weiss
It is quite interesting, Erik, thanks for the link. I'm sure you're aware of the post-search clustering addon to Nutch that is based on the project I'm heading -- Carrot2. If you have any ideas of how this could be made better, I'm always open to suggestions. Regards, Dawid

Re: Arabic analyzer

2004-10-07 Thread Dawid Weiss
nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the words meaning so two word spelled exactly the same way with different diacritics will mean two separate things,

Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
Hi William, Ok, here is some demo code I've put together that shows how you can achieve clustering of Lucene's results. I hope this will get you started on your projects. If you have questions, please don't hesitate to ask -- cross posts to carrot2-developers would be a good idea too. The code

Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
No problem. Let people know if it worked for you -- I look forward to hearing your experiences (good or bad). Dawid William W wrote: Thanks Dawid ! :) From: Dawid Weiss [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re

Re: Free software to crawl internet site?

2004-09-29 Thread Dawid Weiss
Nutch has a crawler. So does Egothor (the crawler is called Capek). If you type web crawler in Google you'll get tons of projects. Dawid Zhang, Lisheng wrote: Hi, Does anyone know if there is free-software to crawl internet site (webcrawler)? I know currently lucene does not have this feature

Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Dear all, I saw a post about an attempt to integrate Carrot2 with Lucene. It was a while ago, so I'm curious if any outcome has been achieved. Anyway, as the project coordinator I can offer my help with such integration; if you're looking for some ready-to-use code then there is a clustering

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). Dawid William W wrote: Hi Dawid, I would like to use Carrot2 with lucene. Do you have examples ? Thanks a lot, William. From: Dawid Weiss

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
. Andrzej Bialecki wrote: Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
just very shy by nature and doesn't talk much, hehe. D. William W wrote: Hi Dawid, The demos (under /src/demo) are very good. They have the basic usage scenario. Thanks Andrzej. William. Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide