Hi, guys, I'm working through the tutorial here <http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch>. I've run a crawl on a list of webpages. Now I'm trying to index them into Solr. Solr's installed, runs fine, indexes .json, .xml, whatever, returns queries. I've edited the Nutch schema as per instructions. Now I hit a wall:
- Save the file and restart Solr under ${APACHE_SOLR_HOME}/example: java -jar start.jar\ On my install (the latest Solr,) there is no such file, but there is a solr.sh file in the /bin which I can start. So I pasted it into solr/example/ and ran it from there. Solr cranks over. Now I need to: - run the Solr Index command from ${NUTCH_RUNTIME_HOME}: bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/ and I get this: *ubuntu@ubuntu-VirtualBox:~/crawler/nutch$ bin/nutch solrindex http://127.0.0.1:8983/solr/ <http://127.0.0.1:8983/solr/> crawl/crawldb -linkdb crawl/linkdb crawl/segments/* *Indexer: starting at 2015-03-01 19:51:09* *Indexer: deleting gone documents: false* *Indexer: URL filtering: false* *Indexer: URL normalizing: false* *Active IndexWriters :* *SOLRIndexWriter* * solr.server.url : URL of the SOLR instance (mandatory)* * solr.commit.size : buffer size when sending to SOLR (default 1000)* * solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)* * solr.auth : use authentication (default false)* * solr.auth.username : use authentication (default false)* * solr.auth : username for authentication* * solr.auth.password : password for authentication* *Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/ubuntu/crawler/nutch/crawl/segments/crawl_fetch* *Input path does not exist: file:/home/ubuntu/crawler/nutch/crawl/segments/crawl_parse* *Input path does not exist: file:/home/ubuntu/crawler/nutch/crawl/segments/parse_data* *Input path does not exist: file:/home/ubuntu/crawler/nutch/crawl/segments/parse_text* *Input path does not exist: file:/home/ubuntu/crawler/nutch/crawl/crawldb/current* *Input path does not exist: file:/home/ubuntu/crawler/nutch/crawl/linkdb/current* * at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)* * at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40)* * at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)* * at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)* * at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)* * at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)* * at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)* * at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)* * at java.security.AccessController.doPrivileged(Native Method)* * at javax.security.auth.Subject.doAs(Subject.java:415)* * at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)* * at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)* * at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)* * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)* * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)* * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)* * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)* What am I doing wrong? Sincerely, Baruch Kogan Marketing Manager Seller Panda <http://sellerpanda.com> +972(58)441-3829 baruch.kogan at Skype