Which segments are you trying to generate from? Do you maybe need to include them individually? or use a wildcard?
bin/nutch generate crawldb crawldb/segments/* bin/nutch generate crawldb crawldb/segments/segmentNo ? On Wed, May 9, 2012 at 3:33 PM, Stephan Kristyn <[email protected]>wrote: > Ok now at the heading "Step-by-Step: Fetching" I get > > -bash-4.1$ bin/nutch generate crawldb crawldb/segments > Generator: starting at 2012-05-09 14:32:44 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: jobtracker is 'local', generating exactly one partition. > Generator: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: > file:/home/kristyns/apache-nutch-1.4-bin/runtime/local/crawldb/current > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > at org.apache.nutch.crawl.Generator.generate(Generator.java:538) > at org.apache.nutch.crawl.Generator.run(Generator.java:704) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Generator.main(Generator.java:660) > > Strange... > > Am 09.05.2012 16:04, schrieb Stephan Kristyn: > > Hi, it seems like I forgot to fetch the crawled URLs, as mentioned in the > tutorial: > > http://wiki.apache.org/nutch/NutchTutorial > > > I'll let you know if and how that worked out for me. > > Am 09.05.2012 14:28, schrieb Stephan Kristyn: > > This is the query that the SOLR interface generates when I enter "test" and > hit the serach > button:http://myDomain:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on > > Maybe this is a question better suited for the Solr ML? > > From: Lewis John Mcgibbney [mailto:[email protected] > <[email protected]>] > Sent: Mittwoch, 9. Mai 2012 13:34 > To: [email protected] > Subject: Re: HTTP ERROR 400 > > are you attempting to index to Solr or is this simply when you start you solr > server? > On Wed, May 9, 2012 at 12:21 PM, Stephan Kristyn > <[email protected]<mailto:[email protected]> > <[email protected]>> wrote: > I copied over the schema and everything else in conf from nutch. > > $cp apache-nutch-1.4-bin/runtime/local/conf/* > apache-solr-3.6.0/example/solr/conf/ > > > > > Am 09.05.2012 12:32, schrieb Lewis John Mcgibbney: > > Which schema are you using with your SOlr server? > > > > On Wed, May 9, 2012 at 11:17 AM, Stephan Kristyn <[email protected]> > <[email protected]><mailto:[email protected]> > <[email protected]> wrote: > > Also.. entering > > > > java -jar post.jar *.xml on RHEL6 I get a > > > > INFO: [] webapp=/solr path=/update params={} status=400 QTime=42 > > SimplePostTool: FATAL: Solr returned an error #400 ERROR: > > [doc=GB18030TEST] unknown field 'name' > > > > Thanks, > > Stephan > > > > > > Am 09.05.2012 12:11, schrieb Stephan Kristyn: > > Hi, > > > > after installing Nutch and Solr I get a > > > > > > HTTP ERROR 400 > > > > Problem accessing /solr/select/. Reason: > > > > undefined field text > > > > ------------------------------------------------------------------------ > > /Powered by Jetty:// > > > > > > > > /Any ideas how to fix this? > > > > Thanks, > > Stephan > > -- > > stephan > kristyn > partner operations manager > > "The Internet? Is that thing still around?" - Homer Simpson > [email protected]<mailto:[email protected]> <[email protected]> > direct +49 (0)89 231 97 207<tel:%2B49%20%280%2989%20231%2097%20207> mobile > +49 (0) 162 28899 02<tel:%2B49%20%280%29%20162%2028899%2002> > > yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany > phone (408) 349 3300<tel:%28408%29%20349%203300> fax (408) 349 > 3301<tel:%28408%29%20349%203301> > > [cid:[email protected]] > > > > > > -- > Lewis > > > -- > **** > > ** ** > > *stephan* > *kristyn* > partner operations manager > > "The Internet? Is that thing still around?" - Homer Simpson > > [email protected] > direct +49 (0)89 231 97 207 mobile +49 (0) 162 28899 02 > > yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany > phone (408) 349 3300 fax (408) 349 3301 > > [image: > http://us.i1.yimg.com/us.yimg.com/i/pt/i/buzzmktg/brand/logos/yahoo_email_sig_generic_v2.gif] > **** > > ** ** > > > -- > **** > > ** ** > > *stephan* > *kristyn* > partner operations manager > > "The Internet? Is that thing still around?" - Homer Simpson > > [email protected] > direct +49 (0)89 231 97 207 mobile +49 (0) 162 28899 02 > > yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany > phone (408) 349 3300 fax (408) 349 3301 > > [image: > http://us.i1.yimg.com/us.yimg.com/i/pt/i/buzzmktg/brand/logos/yahoo_email_sig_generic_v2.gif] > **** > > ** ** > -- *Lewis*

