Which segments are you trying to generate from? Do you maybe need to
include them individually? or use a wildcard?

bin/nutch generate crawldb crawldb/segments/*
bin/nutch generate crawldb crawldb/segments/segmentNo

?

On Wed, May 9, 2012 at 3:33 PM, Stephan Kristyn <[email protected]>wrote:

>  Ok now at the heading "Step-by-Step: Fetching" I get
>
> -bash-4.1$ bin/nutch generate crawldb crawldb/segments
> Generator: starting at 2012-05-09 14:32:44
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: org.apache.hadoop.mapred.InvalidInputException: Input path does
> not exist:
> file:/home/kristyns/apache-nutch-1.4-bin/runtime/local/crawldb/current
>         at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
>         at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
>         at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
>         at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:538)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:704)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:660)
>
> Strange...
>
> Am 09.05.2012 16:04, schrieb Stephan Kristyn:
>
> Hi, it seems like I forgot to fetch the crawled URLs, as mentioned in the
> tutorial:
>
> http://wiki.apache.org/nutch/NutchTutorial
>
>
> I'll let you know if and how that worked out for me.
>
> Am 09.05.2012 14:28, schrieb Stephan Kristyn:
>
> This is the query that the SOLR interface generates when I enter "test" and 
> hit the serach 
> button:http://myDomain:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on
>
> Maybe this is a question better suited for the Solr ML?
>
> From: Lewis John Mcgibbney [mailto:[email protected] 
> <[email protected]>]
> Sent: Mittwoch, 9. Mai 2012 13:34
> To: [email protected]
> Subject: Re: HTTP ERROR 400
>
> are you attempting to index to Solr or is this simply when you start you solr 
> server?
> On Wed, May 9, 2012 at 12:21 PM, Stephan Kristyn 
> <[email protected]<mailto:[email protected]> 
> <[email protected]>> wrote:
> I copied over the schema and everything else in conf from nutch.
>
> $cp apache-nutch-1.4-bin/runtime/local/conf/* 
> apache-solr-3.6.0/example/solr/conf/
>
>
>
>
> Am 09.05.2012 12:32, schrieb Lewis John Mcgibbney:
>
> Which schema are you using with your SOlr server?
>
>
>
> On Wed, May 9, 2012 at 11:17 AM, Stephan Kristyn <[email protected]> 
> <[email protected]><mailto:[email protected]> 
> <[email protected]> wrote:
>
> Also.. entering
>
>
>
> java -jar post.jar *.xml on RHEL6 I get a
>
>
>
> INFO: [] webapp=/solr path=/update params={} status=400 QTime=42
>
> SimplePostTool: FATAL: Solr returned an error #400 ERROR:
>
> [doc=GB18030TEST] unknown field 'name'
>
>
>
> Thanks,
>
> Stephan
>
>
>
>
>
> Am 09.05.2012 12:11, schrieb Stephan Kristyn:
>
> Hi,
>
>
>
> after installing Nutch and Solr I get a
>
>
>
>
>
>     HTTP ERROR 400
>
>
>
> Problem accessing /solr/select/. Reason:
>
>
>
>     undefined field text
>
>
>
> ------------------------------------------------------------------------
>
> /Powered by Jetty://
>
>
>
>
>
>
>
> /Any ideas how to fix this?
>
>
>
> Thanks,
>
> Stephan
>
> --
>
> stephan
> kristyn
> partner operations manager
>
> "The Internet? Is that thing still around?" - Homer Simpson
> [email protected]<mailto:[email protected]> <[email protected]>
> direct +49 (0)89 231 97 207<tel:%2B49%20%280%2989%20231%2097%20207>    mobile 
> +49 (0) 162 28899 02<tel:%2B49%20%280%29%20162%2028899%2002>
>
> yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
> phone (408) 349 3300<tel:%28408%29%20349%203300>    fax (408) 349 
> 3301<tel:%28408%29%20349%203301>
>
> [cid:[email protected]]
>
>
>
>
>
> --
> Lewis
>
>
> --
> ****
>
> ** **
>
> *stephan*
> *kristyn*
> partner operations manager
>
> "The Internet? Is that thing still around?" - Homer Simpson
>
> [email protected]
> direct +49 (0)89 231 97 207    mobile +49 (0) 162 28899 02
>
> yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
> phone (408) 349 3300    fax (408) 349 3301
>
> [image:
> http://us.i1.yimg.com/us.yimg.com/i/pt/i/buzzmktg/brand/logos/yahoo_email_sig_generic_v2.gif]
>  ****
>
> ** **
>
>
> --
> ****
>
> ** **
>
> *stephan*
> *kristyn*
> partner operations manager
>
> "The Internet? Is that thing still around?" - Homer Simpson
>
> [email protected]
> direct +49 (0)89 231 97 207    mobile +49 (0) 162 28899 02
>
> yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
> phone (408) 349 3300    fax (408) 349 3301
>
> [image:
> http://us.i1.yimg.com/us.yimg.com/i/pt/i/buzzmktg/brand/logos/yahoo_email_sig_generic_v2.gif]
>  ****
>
> ** **
>



-- 
*Lewis*

Reply via email to