Hi Luis, There appears to be two things which stick out for me here. I'm not sure if you know what Solr is not shipped with Nutch therefore to index you would also need to include some kind of Solr image in your parent project. I mention this because although the log at 2011-09-15 16:57:09 indicated that the crawl class has successfully obtained your SolrUrl property it does not check that this property is available or actually exists until the Solr tasks execute e.g. incrementally after you have crawled or alternatively after you have done all of your crawling. If you know this I am sorry however I thought I would put this in here unless this wasn't the case and we didn't pick it up until a bit later.
Traditionally we use the nutch job file for running crawling jobs across a Hadoop configuration therefore I am kind of stuck with this one. We do not push any Nutch related stuff to the Sonatype Nexus Maven Repository so you can't therefore pull it and depend upon in an any way. On Thu, Sep 15, 2011 at 4:06 PM, Luis Cappa Banda <[email protected]>wrote: > Hello. > > I've downloaded Nutch-1.3 version via Subversion and modified some classes > a > little. My intention is to integrate with Maven the new artifacts created > from the new "hacked" Nutch version and integrate them with another Maven > project which has a dependency to the hacked version mentioned. Both > projects (Nutch personalized version and the other project) are inside a > parent project that orchestrates compilation by modules. All configuration > aparently looks good and compiles correctly. > > When launching a crawling process using Solr index option appears the > following error: > > 2011-09-15 16:57:07,137 0 [main] INFO > es.desa.empleate.infojobs.CrawlingProperties - Loading property file... > 2011-09-15 16:57:07,144 7 [main] INFO > es.desa.empleate.infojobs.CrawlingProperties - Property file loaded! > 2011-09-15 16:57:07,145 8 [main] INFO > es.desa.empleate.infojobs.CrawlingProperties - Retrieving property > 'URLS_DIR' > 2011-09-15 16:57:07,145 8 [main] INFO > es.desa.empleate.infojobs.CrawlingProperties - Retrieving property > 'SOLR_SERVER' > 2011-09-15 16:57:07,145 8 [main] INFO > es.desa.empleate.infojobs.CrawlingProperties - Retrieving property 'DEPTH' > 2011-09-15 16:57:07,145 8 [main] INFO > es.desa.empleate.infojobs.CrawlingProperties - Retrieving property > 'THREADS' > 2011-09-15 16:57:08,259 1122 [main] INFO > es.desa.empleate.infojobs.CrawlingProcess - > Crawling process started... > 2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl - > crawl started in: crawl-20110915165709 > 2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl - > rootUrlDir =urls > 2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl - > threads = 10 > 2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl - > depth = 3 > 2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl - > solrUrl=http://localhost:8080/server_infojobs > 2011-09-15 16:57:10,090 2953 [main] INFO org.apache.nutch.crawl.Injector > - Injector: starting at 2011-09-15 16:57:10 > 2011-09-15 16:57:10,090 2953 [main] INFO org.apache.nutch.crawl.Injector > - Injector: crawlDb: crawl-20110915165709/crawldb > 2011-09-15 16:57:10,090 2953 [main] INFO org.apache.nutch.crawl.Injector > - Injector: urlDir: > > /home/lcappa/Escritorio/workspaces/Tomcats/Tomcat2/apache-tomcat-6.0.29/urls > 2011-09-15 16:57:10,236 3099 [main] INFO org.apache.nutch.crawl.Injector > - Injector: Converting injected urls to crawl db entries. > 2011-09-15 16:57:10,258 3121 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with > processName=JobTracker, sessionId= > * 2011-09-15 16:57:10,328 3191 [main] WARN > org.apache.hadoop.mapred.JobClient - No job jar file set. User classes > may > not be found. See JobConf(Class) or JobConf#setJar(String).* > 2011-09-15 16:57:10,344 3207 [main] INFO > org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : > 1 > 2011-09-15 16:57:10,567 3430 [Thread-10] INFO > org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : > 1 > 2011-09-15 16:57:10,584 3447 [main] INFO > org.apache.hadoop.mapred.JobClient - Running job: job_local_0001 > 2011-09-15 16:57:10,642 3505 [Thread-10] INFO > org.apache.hadoop.mapred.MapTask - numReduceTasks: 1 > 2011-09-15 16:57:10,648 3511 [Thread-10] INFO > org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 > 2011-09-15 16:57:10,772 3635 [Thread-10] INFO > org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 > 2011-09-15 16:57:10,772 3635 [Thread-10] INFO > org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 > 2011-09-15 16:57:10,794 3657 [Thread-10] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 > * java.lang.RuntimeException: Error in configuring object* > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 5 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ... 10 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 13 more > Caused by: java.lang.IllegalArgumentException: plugin.folders is not > defined > at > > org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) > at > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71) > at > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99) > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117) > at > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) > ... 18 more > 2011-09-15 16:57:11,587 4450 [main] INFO > org.apache.hadoop.mapred.JobClient - map 0% reduce 0% > 2011-09-15 16:57:11,590 4453 [main] INFO > org.apache.hadoop.mapred.JobClient - Job complete: job_local_0001 > 2011-09-15 16:57:11,591 4454 [main] INFO > org.apache.hadoop.mapred.JobClient - Counters: 0 > 2011-09-15 16:57:11,591 4454 [main] ERROR > es.desa.empleate.infojobs.CrawlingProcess - > INFOJOBS CRAWLING ERROR: Job > failed! > 2011-09-15 16:57:11,591 4454 [main] INFO > es.desa.empleate.infojobs.CrawlingProcess - > Crawling process finished. > > > Looking at the error I think that I need to include nutch .job artifact > too. > The question is: is that so? If I have to, how can include it with Maven? > Any recomendation? > > Thank you very much. > -- *Lewis*

