I'm still having trouble with this in 1.3. looks as if there's something
dumb with syntax or file structure but can't get it.

$ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb
-linkdb crawl/linkdb crawl/segments/*

SolrIndexer: starting at 2011-10-25 23:26:02
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch
Input path does not exist:
file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse
Input path does not exist:
file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data
Input path does not exist:
file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text
Input path does not exist:
file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current


On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma
<[email protected]>wrote:

> From the changelog:
> http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup
>
> 111     * NUTCH-1054 LinkDB optional during indexing (jnioche)
>
> With your command, the given linkdb is interpreted as a segment.
>
> https://issues.apache.org/jira/browse/NUTCH-1054
>
> This is the new command:
>
> Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... |
> -
> dir <segments>) [-noCommit
>
> On Tuesday 25 October 2011 18:41:09 Bai Shen wrote:
> > I'm having a similar issue.  I'm using 1.4 and getting these errors with
> > linkdb.  The segments seem fine.
> >
> > 2011-10-25 10:10:20,060 INFO  solr.SolrIndexer - SolrIndexer: starting at
> > 2011-10-25 10:10:20
> > 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce -
> IndexerMapReduce:
> > crawldb: crawl/crawldb
> > 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces:
> > adding segment: crawl/linkdb
> > 2011-10-25 10:10:20,136 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces:
> > adding segment: crawl/segments/20111025095216
> > 2011-10-25 10:10:20,138 INFO  indexer.IndexerMapReduce -
> IndexerMapReduces:
> > adding segment: crawl/segments/20111025100004
> > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer -
> > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist:
> > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
> > Input path does not exist:
> > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
> > Input path does not exist:
> > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data
> > Input path does not exist:
> > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text
> >
> >
> > Did something change with 1.4?
> >
> > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney <
> >
> > [email protected]> wrote:
> > > Hi Fred,
> > >
> > > How many individual directories do you have under
> > > /runtime/local/crawl/segments/
> > > ?
> > >
> > > Another thing that raises alarms is the nohup.out dir's! Are these
> > > intentional? Interestingly, missing segment data is not the same with
> > > these dir's.
> > >
> > > Does your log output indicate any discrepancies between various command
> > > transitions?
> > >
> > >
> > >
> > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$
> bin/nutch
> > >
> > > >> solrindex
> > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb
> > > >> crawl/linkdb crawl/segments/*
> > > >> SolrIndexer: starting at 2011-10-09 00:13:24
> > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > >
> > > exist:
> > >
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922143907/crawl_fetch
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922143907/crawl_parse
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922143907/parse_data
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922143907/parse_text
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922144329/crawl_fetch
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922144329/crawl_parse
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922144329/parse_data
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > 922144329/parse_text
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > > 008015309/crawl_parse
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > > 008015309/parse_data
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > > 008015309/parse_text
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > .out/crawl_fetch
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > .out/crawl_parse
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > .out/parse_data
> > >
> > > >> Input path does not exist:
> > >
> file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > .out/parse_text
> > >
> > > > -----------------------------------------------------
> > > > Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS-for
> > > > monthly updates
> > > >
> > > >
> > > >
> > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney <
> > > >
> > > > [email protected]> wrote:
> > > >> Hi guys,
> > > >>
> > > >> I have been watching this thread intently and I am very happy to see
> > >
> > > that
> > >
> > > >> there is some progress :0)
> > > >>
> > > >> Radim,
> > > >>
> > > >> Can I ask that you open a JIRA issue and submit a patch, this way we
> > > >> can not
> > > >> only track it, but it will also give the community a chance to test
> > > >> and validate the patch prior to integration into the source.
> > > >>
> > > >> Thanks
> > > >>
> > > >> Lewis
> > > >>
> > > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh <
> > > >>
> > > >> [email protected]> wrote:
> > > >> > Hi Radim,
> > > >> >
> > > >> >  Thank you so much for this. I am not familiar with commit process
> > > >> >  to
> > > >>
> > > >> the
> > > >>
> > > >> > core.
> > > >> >
> > > >> >  Is there someone who can help us get this committed and help
> > > >> >  resolve
> > > >>
> > > >> this
> > > >>
> > > >> > issue?
> > > >> >
> > > >> > Thanks for all your help.
> > > >> >
> > > >> > Rajesh Ramana
> > > >> >
> > > >> > -----Original Message-----
> > > >> > From: Radim Kolar [mailto:[email protected]]
> > > >> > Sent: Thursday, October 06, 2011 2:18 PM
> > > >> > To: [email protected]
> > > >> > Subject: Re: Nutch not crawling URLs with spanish accented
> > > >> > characters
> > >
> > > (
> > >
> > > >> ñ)
> > > >>
> > > >> > - The REGEX normalizer transforms the special characters, but
> fails
> > > >> > to substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
> > > >> >
> > > >> >  - The fetcher is having trouble interpreting the links with
> special
> > > >> >
> > > >> > character ‘ñ’.
> > > >> >
> > > >> > i can add this transformation to basic-url normalizer if somebody
> is
> > > >> > willing to commit it.
> > > >>
> > > >> --
> > > >> *Lewis*
> > >
> > > --
> > > *Lewis*
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Reply via email to