i have come across a few sites that contain dead
links, show customized "page not found" errors, and do
not throw a 404 http status code. if the customized
"page not found" page contains relative links, nutch
constructs outlinks using the dead link as a base,
which results in another dead link.
lucene has a porterstemmer:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/PorterStemFilter.html
--- Matthias Jaekle <[EMAIL PROTECTED]> wrote:
> Hi,
>
> > - Has anyone added stemmers to Nutch, I know
> egothor has a good one -- is it
> > compatible?
> we once developed a
Hello,
I had a look at this issue this week and I think I have identified the
problem. I have planned to prepare a patch for it but first I have to
find out the correct way of submitting a patch for nutch. I will
describe the problem briefly here so you can fix it quickly yourself
and later
read ,think
please!
>Dear Sir/Madam,
>
>I am a student from China, and I am very interested in the search engine
>project such as nutch.
>Will you please tell me something about the project's progress?
>And what you need the fresh joinors to do?
>
>WANG Fan
>Shanghai China
>Jan 8th, 2005
>
>
>
Hi,
- Has anyone added stemmers to Nutch, I know egothor has a good one -- is it
compatible?
we once developed a German stemmer classes for nutch:
http://www.nutch.org/cgi-bin/twiki/view/Main/German
Matthias
--
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine für
On Fri, Jan 07, 2005 at 05:27:36PM -0800, Tim England wrote:
>
> I'm trying to track down a bunch of exceptions thrown during the fetch
> phase. For example:
>
> fetch of http://www.zdnet.com/ failed with:
> java.lang.ArrayIndexOutOfBoundsException: 38
It may well be some codes are not th
I'd like to add stemmers to Nutch and add stop words based on language.
- Has anyone added stemmers to Nutch, I know egothor has a good one -- is it
compatible?
- Anyone using a stop word list (apart from the defaults ones used by Nutch)
they'd like to share? And can stop words be specified based