e.apache.org ; [EMAIL PROTECTED]
> *Sent:* Monday, July 28, 2008 2:43 PM
> *Subject:* Re: nutch fetched but no indexed
>
> Hi,
>
> Thank you for wuqi's help.
>
> I check it under luke and can not find it.
>
> Now I import the
the
> segement file..
>
>
>
> - Original Message -
> From: "宫照" <[EMAIL PROTECTED]>
> To: ; <[EMAIL PROTECTED]>
> Sent: Friday, July 25, 2008 9:53 AM
> Subject: Re: nutch fetched but no indexed
>
>
> > Hi Patrick,
> >
> >
check the status of this page in
crawldb,if it is db_fetched, then try to check wheter it exist in the segement
file..
- Original Message -
From: "宫照" <[EMAIL PROTECTED]>
To: ; <[EMAIL PROTECTED]>
Sent: Friday, July 25, 2008 9:53 AM
Subject: Re: nutch fetch
Hi Patrick,
Thank you for your advice.
my nutch-site.xml file is already set as you said and I can search pdf file
under other urls.
Just the file under the url I said before can not be indexed .
I guess maybe It is about the type of urls. Because from log we can see it
was fetched but not ind
Hi Gong Zhao,
Make sure you have the parse-pdf plugin enabled in your nutch-site.xml
file.
I.e.
plugin.includes
...|parse-(xml|text|html|js|pdf)|...
That's the only thing I can think of at first glance.
Patrick
-Original Message-
From: 宫照 [mailto:[EMAIL PROTECTED]
Se