I'm having a similar issue. I'm using 1.4 and getting these errors with linkdb. The segments seem fine.
2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting at 2011-10-25 10:10:20 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/linkdb 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20111025095216 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20111025100004 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data Input path does not exist: file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text Did something change with 1.4? On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < [email protected]> wrote: > Hi Fred, > > How many individual directories do you have under > /runtime/local/crawl/segments/ > ? > > Another thing that raises alarms is the nohup.out dir's! Are these > intentional? Interestingly, missing segment data is not the same with these > dir's. > > Does your log output indicate any discrepancies between various command > transitions? > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch > >> solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > >> crawl/linkdb crawl/segments/* > >> SolrIndexer: starting at 2011-10-09 00:13:24 > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data > >> Input path does not exist: > >> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text > > > > > > > > ----------------------------------------------------- > > Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for > > monthly updates > > > > > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < > > [email protected]> wrote: > > > >> Hi guys, > >> > >> I have been watching this thread intently and I am very happy to see > that > >> there is some progress :0) > >> > >> Radim, > >> > >> Can I ask that you open a JIRA issue and submit a patch, this way we can > >> not > >> only track it, but it will also give the community a chance to test and > >> validate the patch prior to integration into the source. > >> > >> Thanks > >> > >> Lewis > >> > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < > >> [email protected]> wrote: > >> > >> > Hi Radim, > >> > > >> > Thank you so much for this. I am not familiar with commit process to > >> the > >> > core. > >> > Is there someone who can help us get this committed and help resolve > >> this > >> > issue? > >> > > >> > Thanks for all your help. > >> > > >> > Rajesh Ramana > >> > > >> > -----Original Message----- > >> > From: Radim Kolar [mailto:[email protected]] > >> > Sent: Thursday, October 06, 2011 2:18 PM > >> > To: [email protected] > >> > Subject: Re: Nutch not crawling URLs with spanish accented characters > ( > >> ñ) > >> > > >> > - The REGEX normalizer transforms the special characters, but fails to > >> > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ > >> > - The fetcher is having trouble interpreting the links with special > >> > character ‘ñ’. > >> > > >> > i can add this transformation to basic-url normalizer if somebody is > >> > willing to commit it. > >> > > >> > >> > >> > >> -- > >> *Lewis* > >> > > > > > > > -- > *Lewis* >

