Hi Fred, How many individual directories do you have under /runtime/local/crawl/segments/ ?
Another thing that raises alarms is the nohup.out dir's! Are these intentional? Interestingly, missing segment data is not the same with these dir's. Does your log output indicate any discrepancies between various command transitions? bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch >> solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/ crawl/crawldb >> crawl/linkdb crawl/segments/* >> SolrIndexer: starting at 2011-10-09 00:13:24 >> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data >> Input path does not exist: >> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text > > > > ----------------------------------------------------- > Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for > monthly updates > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < > [email protected]> wrote: > >> Hi guys, >> >> I have been watching this thread intently and I am very happy to see that >> there is some progress :0) >> >> Radim, >> >> Can I ask that you open a JIRA issue and submit a patch, this way we can >> not >> only track it, but it will also give the community a chance to test and >> validate the patch prior to integration into the source. >> >> Thanks >> >> Lewis >> >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < >> [email protected]> wrote: >> >> > Hi Radim, >> > >> > Thank you so much for this. I am not familiar with commit process to >> the >> > core. >> > Is there someone who can help us get this committed and help resolve >> this >> > issue? >> > >> > Thanks for all your help. >> > >> > Rajesh Ramana >> > >> > -----Original Message----- >> > From: Radim Kolar [mailto:[email protected]] >> > Sent: Thursday, October 06, 2011 2:18 PM >> > To: [email protected] >> > Subject: Re: Nutch not crawling URLs with spanish accented characters ( >> ñ) >> > >> > - The REGEX normalizer transforms the special characters, but fails to >> > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ >> > - The fetcher is having trouble interpreting the links with special >> > character ‘ñ’. >> > >> > i can add this transformation to basic-url normalizer if somebody is >> > willing to commit it. >> > >> >> >> >> -- >> *Lewis* >> > > -- *Lewis*

