Hi Fred,

Please ensure that the linkdb command was executed succesfully. The output
logs do not indicate this.
Looks like you've got a '-' minus character in from of the relative linkdb
directory as well.

HTH

On Wed, Oct 26, 2011 at 1:27 AM, Fred Zimmerman <[email protected]>wrote:

> I'm still having trouble with this in 1.3. looks as if there's something
> dumb with syntax or file structure but can't get it.
>
> $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb
> -linkdb crawl/linkdb crawl/segments/*
>
> SolrIndexer: starting at 2011-10-25 23:26:02
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text
> Input path does not exist:
> file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current
>
>
> On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma
> <[email protected]>wrote:
>
> > From the changelog:
> > http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup
> >
> > 111     * NUTCH-1054 LinkDB optional during indexing (jnioche)
> >
> > With your command, the given linkdb is interpreted as a segment.
> >
> > https://issues.apache.org/jira/browse/NUTCH-1054
> >
> > This is the new command:
> >
> > Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ...
> |
> > -
> > dir <segments>) [-noCommit
> >
> > On Tuesday 25 October 2011 18:41:09 Bai Shen wrote:
> > > I'm having a similar issue.  I'm using 1.4 and getting these errors
> with
> > > linkdb.  The segments seem fine.
> > >
> > > 2011-10-25 10:10:20,060 INFO  solr.SolrIndexer - SolrIndexer: starting
> at
> > > 2011-10-25 10:10:20
> > > 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduce:
> > > crawldb: crawl/crawldb
> > > 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduces:
> > > adding segment: crawl/linkdb
> > > 2011-10-25 10:10:20,136 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduces:
> > > adding segment: crawl/segments/20111025095216
> > > 2011-10-25 10:10:20,138 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduces:
> > > adding segment: crawl/segments/20111025100004
> > > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer -
> > > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > exist:
> > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
> > > Input path does not exist:
> > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
> > > Input path does not exist:
> > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data
> > > Input path does not exist:
> > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text
> > >
> > >
> > > Did something change with 1.4?
> > >
> > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney <
> > >
> > > [email protected]> wrote:
> > > > Hi Fred,
> > > >
> > > > How many individual directories do you have under
> > > > /runtime/local/crawl/segments/
> > > > ?
> > > >
> > > > Another thing that raises alarms is the nohup.out dir's! Are these
> > > > intentional? Interestingly, missing segment data is not the same with
> > > > these dir's.
> > > >
> > > > Does your log output indicate any discrepancies between various
> command
> > > > transitions?
> > > >
> > > >
> > > >
> > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$
> > bin/nutch
> > > >
> > > > >> solrindex
> > > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb
> > > > >> crawl/linkdb crawl/segments/*
> > > > >> SolrIndexer: starting at 2011-10-09 00:13:24
> > > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does
> not
> > > >
> > > > exist:
> > > >
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922143907/crawl_fetch
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922143907/crawl_parse
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922143907/parse_data
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922143907/parse_text
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922144329/crawl_fetch
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922144329/crawl_parse
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922144329/parse_data
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > > > 922144329/parse_text
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > > > 008015309/crawl_parse
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > > > 008015309/parse_data
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > > > 008015309/parse_text
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > > .out/crawl_fetch
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > > .out/crawl_parse
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > > .out/parse_data
> > > >
> > > > >> Input path does not exist:
> > > >
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > > > .out/parse_text
> > > >
> > > > > -----------------------------------------------------
> > > > > Subscribe to the Nimble Books Mailing List
> http://eepurl.com/czS-for
> > > > > monthly updates
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney <
> > > > >
> > > > > [email protected]> wrote:
> > > > >> Hi guys,
> > > > >>
> > > > >> I have been watching this thread intently and I am very happy to
> see
> > > >
> > > > that
> > > >
> > > > >> there is some progress :0)
> > > > >>
> > > > >> Radim,
> > > > >>
> > > > >> Can I ask that you open a JIRA issue and submit a patch, this way
> we
> > > > >> can not
> > > > >> only track it, but it will also give the community a chance to
> test
> > > > >> and validate the patch prior to integration into the source.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> Lewis
> > > > >>
> > > > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh <
> > > > >>
> > > > >> [email protected]> wrote:
> > > > >> > Hi Radim,
> > > > >> >
> > > > >> >  Thank you so much for this. I am not familiar with commit
> process
> > > > >> >  to
> > > > >>
> > > > >> the
> > > > >>
> > > > >> > core.
> > > > >> >
> > > > >> >  Is there someone who can help us get this committed and help
> > > > >> >  resolve
> > > > >>
> > > > >> this
> > > > >>
> > > > >> > issue?
> > > > >> >
> > > > >> > Thanks for all your help.
> > > > >> >
> > > > >> > Rajesh Ramana
> > > > >> >
> > > > >> > -----Original Message-----
> > > > >> > From: Radim Kolar [mailto:[email protected]]
> > > > >> > Sent: Thursday, October 06, 2011 2:18 PM
> > > > >> > To: [email protected]
> > > > >> > Subject: Re: Nutch not crawling URLs with spanish accented
> > > > >> > characters
> > > >
> > > > (
> > > >
> > > > >> ñ)
> > > > >>
> > > > >> > - The REGEX normalizer transforms the special characters, but
> > fails
> > > > >> > to substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
> > > > >> >
> > > > >> >  - The fetcher is having trouble interpreting the links with
> > special
> > > > >> >
> > > > >> > character ‘ñ’.
> > > > >> >
> > > > >> > i can add this transformation to basic-url normalizer if
> somebody
> > is
> > > > >> > willing to commit it.
> > > > >>
> > > > >> --
> > > > >> *Lewis*
> > > >
> > > > --
> > > > *Lewis*
> >
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350
> >
>



-- 
*Lewis*

Reply via email to