[Nutch-dev] Exploding number links due to bad sites

2005-01-08 Thread Xin-Yi Liu
i have come across a few sites that contain dead links, show customized "page not found" errors, and do not throw a 404 http status code. if the customized "page not found" page contains relative links, nutch constructs outlinks using the dead link as a base, which results in another dead link.

Re: [Nutch-dev] Adding Stemmers & Stop words

2005-01-08 Thread Xin-Yi Liu
lucene has a porterstemmer: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/PorterStemFilter.html --- Matthias Jaekle <[EMAIL PROTECTED]> wrote: > Hi, > > > - Has anyone added stemmers to Nutch, I know > egothor has a good one -- is it > > compatible? > we once developed a

Re: [Nutch-dev] ArrayIndexOutOfBoundsException during fetch

2005-01-08 Thread Piotr Kosiorowski
Hello, I had a look at this issue this week and I think I have identified the problem. I have planned to prepare a patch for it but first I have to find out the correct way of submitting a patch for nutch. I will describe the problem briefly here so you can fix it quickly yourself and later

Re: [Nutch-dev] Hello!

2005-01-08 Thread Fenng
read ,think please! >Dear Sir/Madam, > >I am a student from China, and I am very interested in the search engine >project such as nutch. >Will you please tell me something about the project's progress? >And what you need the fresh joinors to do? > >WANG Fan >Shanghai China >Jan 8th, 2005 > > >

Re: [Nutch-dev] Adding Stemmers & Stop words

2005-01-08 Thread Matthias Jaekle
Hi, - Has anyone added stemmers to Nutch, I know egothor has a good one -- is it compatible? we once developed a German stemmer classes for nutch: http://www.nutch.org/cgi-bin/twiki/view/Main/German Matthias -- http://www.eventax.com - eventax GmbH http://www.umkreisfinder.de - Die Suchmaschine für

Re: [Nutch-dev] ArrayIndexOutOfBoundsException during fetch

2005-01-08 Thread John X
On Fri, Jan 07, 2005 at 05:27:36PM -0800, Tim England wrote: > > I'm trying to track down a bunch of exceptions thrown during the fetch > phase. For example: > > fetch of http://www.zdnet.com/ failed with: > java.lang.ArrayIndexOutOfBoundsException: 38 It may well be some codes are not th

[Nutch-dev] Adding Stemmers & Stop words

2005-01-08 Thread Chirag Chaman
I'd like to add stemmers to Nutch and add stop words based on language. - Has anyone added stemmers to Nutch, I know egothor has a good one -- is it compatible? - Anyone using a stop word list (apart from the defaults ones used by Nutch) they'd like to share? And can stop words be specified based