Hi Markus, When i run this command :
*nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/** I got an error here is the log : 2014-11-03 17:55:04,602 INFO indexer.IndexingJob - Indexer: starting at 2014-11-03 17:55:04 2014-11-03 17:55:04,652 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 2014-11-03 17:55:04,652 INFO indexer.IndexingJob - Indexer: URL filtering: false 2014-11-03 17:55:04,652 INFO indexer.IndexingJob - Indexer: URL normalizing: false 2014-11-03 17:55:04,860 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2014-11-03 17:55:04,861 INFO indexer.IndexingJob - Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication 2014-11-03 17:55:04,865 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/indexes 2014-11-03 17:55:04,865 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/crawldb 2014-11-03 17:55:04,978 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/linkdb 2014-11-03 17:55:04,979 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141103163424 2014-11-03 17:55:04,980 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141103175027 2014-11-03 17:55:04,981 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141103175109 2014-11-03 17:55:05,033 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-11-03 17:55:05,110 ERROR security.UserGroupInformation - PriviledgedActionException as:me cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/crawl_fetch Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/crawl_parse Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/parse_data Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/parse_text Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/crawl_fetch Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/crawl_parse Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/parse_data Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/parse_text Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/segments/20141103163424/crawl_parse Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/segments/20141103163424/parse_data Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/segments/20141103163424/parse_text Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/indexes/current 2014-11-03 17:55:05,112 ERROR indexer.IndexingJob - Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/crawl_fetch Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/crawl_parse Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/parse_data Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/crawldb/parse_text Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/crawl_fetch Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/crawl_parse Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/parse_data Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/linkdb/parse_text Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/segments/20141103163424/crawl_parse Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/segments/20141103163424/parse_data Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/segments/20141103163424/parse_text Input path does not exist: file:/home/me/SoftwareDevelopment/Crawling/dataCrawl/crawl/indexes/current at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) Advice me please.. On Mon, Nov 3, 2014 at 5:47 PM, Muhamad Muchlis <tru3....@gmail.com> wrote: > Like this ? > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > <name>http.agent.name</name> > <value>My Nutch Spider</value> > </property> > > *<property>* > * <name>solr.server.url</name>* > * <value>http://localhost:8983/solr/ <http://localhost:8983/solr/></value>* > *</property>* > > > </configuration> > > > On Mon, Nov 3, 2014 at 5:41 PM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > >> You can set solr.server.url in your nutch-site.xml or pass it via command >> line as -Dsolr.server.url=<URL> >> >> >> >> -----Original message----- >> > From:Muhamad Muchlis <tru3....@gmail.com> >> > Sent: Monday 3rd November 2014 11:37 >> > To: user@nutch.apache.org >> > Subject: Re: [Error Crawling Job Failed] NUTCH 1.9 >> > >> > Hi Markus, >> > >> > Where can I find the settings solr url? -D >> > >> > On Mon, Nov 3, 2014 at 5:31 PM, Markus Jelsma < >> markus.jel...@openindex.io> >> > wrote: >> > >> > > Well, here is is: >> > > java.lang.RuntimeException: Missing SOLR URL. Should be set via >> > > -Dsolr.server.url >> > > >> > > >> > > >> > > -----Original message----- >> > > > From:Muhamad Muchlis <tru3....@gmail.com> >> > > > Sent: Monday 3rd November 2014 10:58 >> > > > To: user@nutch.apache.org >> > > > Subject: Re: [Error Crawling Job Failed] NUTCH 1.9 >> > > > >> > > > 2014-11-03 16:56:06,530 INFO indexer.IndexingJob - Indexer: >> starting at >> > > > 2014-11-03 16:56:06 >> > > > 2014-11-03 16:56:06,582 INFO indexer.IndexingJob - Indexer: >> deleting >> > > gone >> > > > documents: false >> > > > 2014-11-03 16:56:06,582 INFO indexer.IndexingJob - Indexer: URL >> > > filtering: >> > > > false >> > > > 2014-11-03 16:56:06,582 INFO indexer.IndexingJob - Indexer: URL >> > > > normalizing: false >> > > > 2014-11-03 16:56:06,800 ERROR solr.SolrIndexWriter - Missing SOLR >> URL. >> > > > Should be set via -D solr.server.url >> > > > SOLRIndexWriter >> > > > solr.server.url : URL of the SOLR instance (mandatory) >> > > > solr.commit.size : buffer size when sending to SOLR (default 1000) >> > > > solr.mapping.file : name of the mapping file for fields (default >> > > > solrindex-mapping.xml) >> > > > solr.auth : use authentication (default false) >> > > > solr.auth.username : use authentication (default false) >> > > > solr.auth : username for authentication >> > > > solr.auth.password : password for authentication >> > > > >> > > > 2014-11-03 16:56:06,802 ERROR indexer.IndexingJob - Indexer: >> > > > java.lang.RuntimeException: Missing SOLR URL. Should be set via -D >> > > > solr.server.url >> > > > SOLRIndexWriter >> > > > solr.server.url : URL of the SOLR instance (mandatory) >> > > > solr.commit.size : buffer size when sending to SOLR (default 1000) >> > > > solr.mapping.file : name of the mapping file for fields (default >> > > > solrindex-mapping.xml) >> > > > solr.auth : use authentication (default false) >> > > > solr.auth.username : use authentication (default false) >> > > > solr.auth : username for authentication >> > > > solr.auth.password : password for authentication >> > > > >> > > > at >> > > > >> > > >> org.apache.nutch.indexwriter.solr.SolrIndexWriter.setConf(SolrIndexWriter.java:192) >> > > > at >> > > > >> > > >> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159) >> > > > at >> org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57) >> > > > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91) >> > > > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) >> > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > > > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) >> > > > >> > > > >> > > > On Mon, Nov 3, 2014 at 3:41 PM, Markus Jelsma < >> > > markus.jel...@openindex.io> >> > > > wrote: >> > > > >> > > > > Hi - see the logs for more details. >> > > > > Markus >> > > > > >> > > > > -----Original message----- >> > > > > > From:Muhamad Muchlis <tru3....@gmail.com> >> > > > > > Sent: Monday 3rd November 2014 9:15 >> > > > > > To: user@nutch.apache.org >> > > > > > Subject: [Error Crawling Job Failed] NUTCH 1.9 >> > > > > > >> > > > > > Hello. >> > > > > > >> > > > > > I get an error message when I run the command: >> > > > > > >> > > > > > *crawl seed/seed.txt crawl -depth 3 -topN 5* >> > > > > > >> > > > > > >> > > > > > Error Message : >> > > > > > >> > > > > > SOLRIndexWriter >> > > > > > solr.server.url : URL of the SOLR instance (mandatory) >> > > > > > solr.commit.size : buffer size when sending to SOLR (default >> 1000) >> > > > > > solr.mapping.file : name of the mapping file for fields (default >> > > > > > solrindex-mapping.xml) >> > > > > > solr.auth : use authentication (default false) >> > > > > > solr.auth.username : use authentication (default false) >> > > > > > solr.auth : username for authentication >> > > > > > solr.auth.password : password for authentication >> > > > > > >> > > > > > >> > > > > > Indexer: java.io.IOException: Job failed! >> > > > > > at >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) >> > > > > > at >> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) >> > > > > > at >> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) >> > > > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > > > > > at >> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) >> > > > > > >> > > > > > >> > > > > > Can anyone explain why this happened ? >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Best regard's >> > > > > > >> > > > > > M.Muchlis >> > > > > > >> > > > > >> > > > >> > > >> > >