Re: Term Weights and Clustering

2005-02-24 Thread Dawid Weiss
Hi Owen, I'm from the Carrot2 project, so I feel called to the blackboard: One source for how to do this is the thesis of Stanislaw Osinski and others like it: http://www.dcs.shef.ac.uk/teaching/eproj/msc2004/abs/m3so.htm And the Carrot2 project which uses similar techniques. http://www.cs

Re: Document Clustering

2005-02-08 Thread Dawid Weiss
Hi Owen, Last year it was suggested Carrot2 could help, and it would even produce good labels for the clusters. Has this proven to be true? Yes, Carrot2 should help you with this. The labels it creates highly depend on the quality of the input snippets, but the so-called KWIK snippets (keywor

Re: which HTML parser is better?

2005-02-03 Thread Dawid Weiss
Karl, Two things, try to experiment with both: 1) I would try to write a lexical scanner that strips HTML tags, much like the regular expression does. Java lexical scanner packages produce nice pure Java classes that seldom use any advanced API, so they should work on Java 1.1. They are simple s

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-31 Thread Dawid Weiss
Hi Adam. Otis and David have already provided you with pointers to my previous post regarding Carrot2-Lucene integration, so just a tiny note here: Also, when I looked at Carrot2 the pipe line is implemented as over http. I wonder how efficient that is, or can it be changed, for instance for an

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-31 Thread Dawid Weiss
Hi. Coming up with answers... a little belated, but hope you're still on: we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) Yes, there is not "official" release. We just don

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-17 Thread Dawid Weiss
Hi David, I apologize about the delay in answering this one, Lucene is a busy mailing list and I had a hectic last week... Again, sorry for belated answer, hope you still find it useful. That is awesome and very inspirational! Yes, I admit what you've done with Wikipedia is quite interesting and

Re: GUUUI - The optimal layout of search result pages

2004-10-11 Thread Dawid Weiss
It is quite interesting, Erik, thanks for the link. I'm sure you're aware of the post-search clustering addon to Nutch that is based on the project I'm heading -- Carrot2. If you have any ideas of how this could be made better, I'm always open to suggestions. Regards, Dawid http://www.cs.put.po

Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
No problem. Let people know if it worked for you -- I look forward to hearing your experiences (good or bad). Dawid William W wrote: Thanks Dawid ! :) From: Dawid Weiss <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <[EMAIL PROTECTED]> To: Lucene Users List <[

Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
ilter on top of what it returns. Shouldn't be too hard. Dawid Albert Vila wrote: That's great, thanks dawid. Just a question, how can I modify your code in order to use the carrot2-output-xsltrenderer to output the clustering results in a html page? Can you provide an example? Tha

Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
Hi William, Ok, here is some demo code I've put together that shows how you can achieve clustering of Lucene's results. I hope this will get you started on your projects. If you have questions, please don't hesitate to ask -- cross posts to carrot2-developers would be a good idea too. The code

Re: Arabic analyzer

2004-10-06 Thread Dawid Weiss
nothing to do with each other furthermore, Arabic uses phonetic indicators on each letter called diacritics that change the way you pronounce the word which in turn changes the words meaning so two word spelled exactly the same way with different diacritics will mean two separate things, Just

Re: Free software to crawl internet site?

2004-09-29 Thread Dawid Weiss
Nutch has a crawler. So does Egothor (the crawler is called Capek). If you type "web crawler" in Google you'll get tons of projects. Dawid Zhang, Lisheng wrote: Hi, Does anyone know if there is free-software to crawl internet site (webcrawler)? I know currently lucene does not have this feature a

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
ts -- he's just very shy by nature and doesn't talk much, hehe. D. William W wrote: Hi Dawid, The demos (under /src/demo) are very good. They have the basic usage scenario. Thanks Andrzej. William. Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
I get back. D. Andrzej Bialecki wrote: Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identi

Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
;ll try to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). Dawid William W wrote: Hi Dawid, I would like to use Carrot2 with lucene. Do you have examples ? Thanks a lot, William. Fr

Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Dear all, I saw a post about an attempt to integrate Carrot2 with Lucene. It was a while ago, so I'm curious if any outcome has been achieved. Anyway, as the project coordinator I can offer my help with such integration; if you're looking for some ready-to-use code then there is a clustering pl