Re: nutch fetched but no indexed

2008-07-29 Thread 宫照
e.apache.org ; [EMAIL PROTECTED] > *Sent:* Monday, July 28, 2008 2:43 PM > *Subject:* Re: nutch fetched but no indexed > > Hi, > > Thank you for wuqi's help. > > I check it under luke and can not find it. > > Now I import the

Re: nutch fetched but no indexed

2008-07-27 Thread 宫照
the > segement file.. > > > > - Original Message - > From: "宫照" <[EMAIL PROTECTED]> > To: ; <[EMAIL PROTECTED]> > Sent: Friday, July 25, 2008 9:53 AM > Subject: Re: nutch fetched but no indexed > > > > Hi Patrick, > > > >

Re: nutch fetched but no indexed

2008-07-24 Thread wuqi
check the status of this page in crawldb,if it is db_fetched, then try to check wheter it exist in the segement file.. - Original Message - From: "宫照" <[EMAIL PROTECTED]> To: ; <[EMAIL PROTECTED]> Sent: Friday, July 25, 2008 9:53 AM Subject: Re: nutch fetch

Re: nutch fetched but no indexed

2008-07-24 Thread 宫照
Hi Patrick, Thank you for your advice. my nutch-site.xml file is already set as you said and I can search pdf file under other urls. Just the file under the url I said before can not be indexed . I guess maybe It is about the type of urls. Because from log we can see it was fetched but not ind

RE: nutch fetched but no indexed

2008-07-24 Thread Patrick Markiewicz
Hi Gong Zhao, Make sure you have the parse-pdf plugin enabled in your nutch-site.xml file. I.e. plugin.includes ...|parse-(xml|text|html|js|pdf)|... That's the only thing I can think of at first glance. Patrick -Original Message- From: 宫照 [mailto:[EMAIL PROTECTED] Se