I'm having a similar issue.  I'm using 1.4 and getting these errors with
linkdb.  The segments seem fine.

2011-10-25 10:10:20,060 INFO  solr.SolrIndexer - SolrIndexer: starting at
2011-10-25 10:10:20
2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
crawldb: crawl/crawldb
2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
adding segment: crawl/linkdb
2011-10-25 10:10:20,136 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
adding segment: crawl/segments/20111025095216
2011-10-25 10:10:20,138 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
adding segment: crawl/segments/20111025100004
2011-10-25 10:10:20,207 ERROR solr.SolrIndexer -
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
Input path does not exist:
file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
Input path does not exist:
file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data
Input path does not exist:
file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text


Did something change with 1.4?

On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney <
[email protected]> wrote:

> Hi Fred,
>
> How many individual directories do you have under
> /runtime/local/crawl/segments/
> ?
>
> Another thing that raises alarms is the nohup.out dir's! Are these
> intentional? Interestingly, missing segment data is not the same with these
> dir's.
>
> Does your log output indicate any discrepancies between various command
> transitions?
>
>
>
> bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch
> >> solrindex http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb
> >> crawl/linkdb crawl/segments/*
> >> SolrIndexer: starting at 2011-10-09 00:13:24
> >> org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_fetch
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/crawl_parse
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_data
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922143907/parse_text
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_fetch
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/crawl_parse
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_data
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110922144329/parse_text
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/crawl_parse
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_data
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111008015309/parse_text
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_fetch
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/crawl_parse
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_data
> >> Input path does not exist:
> >>
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup.out/parse_text
> >
> >
> >
> > -----------------------------------------------------
> > Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
> > monthly updates
> >
> >
> >
> > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney <
> > [email protected]> wrote:
> >
> >> Hi guys,
> >>
> >> I have been watching this thread intently and I am very happy to see
> that
> >> there is some progress :0)
> >>
> >> Radim,
> >>
> >> Can I ask that you open a JIRA issue and submit a patch, this way we can
> >> not
> >> only track it, but it will also give the community a chance to test and
> >> validate the patch prior to integration into the source.
> >>
> >> Thanks
> >>
> >> Lewis
> >>
> >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh <
> >> [email protected]> wrote:
> >>
> >> > Hi Radim,
> >> >
> >> >  Thank you so much for this. I am not familiar with commit process to
> >> the
> >> > core.
> >> >  Is there someone who can help us get this committed and help resolve
> >> this
> >> > issue?
> >> >
> >> > Thanks for all your help.
> >> >
> >> > Rajesh Ramana
> >> >
> >> > -----Original Message-----
> >> > From: Radim Kolar [mailto:[email protected]]
> >> > Sent: Thursday, October 06, 2011 2:18 PM
> >> > To: [email protected]
> >> > Subject: Re: Nutch not crawling URLs with spanish accented characters
> (
> >> ñ)
> >> >
> >> > - The REGEX normalizer transforms the special characters, but fails to
> >> > substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
> >> >  - The fetcher is having trouble interpreting the links with special
> >> > character ‘ñ’.
> >> >
> >> > i can add this transformation to basic-url normalizer if somebody is
> >> > willing to commit it.
> >> >
> >>
> >>
> >>
> >> --
> >> *Lewis*
> >>
> >
> >
>
>
> --
> *Lewis*
>

Reply via email to