Hi, How did you set up your configuration in Nutch? Using source or binary installation? Are you sure that all configuration specific files are available on classpath?
On Thu, Nov 3, 2011 at 4:22 PM, Skiming_Zhang <[email protected]> wrote: > Hello dear :**** > > I have the following running information from > “hadoop.log” when I configured Nutch 1.3 in Eclipse (Win 7), but I don’t > know how to resolve it ,Can you help me . I’m new to nutch , so forgive me > for some mistakes of using wrong terminology!**** > > ** ** > > 2011-11-03 16:51:53,300 WARN crawl.Crawl - solrUrl is not set, indexing > will be skipped...**** > > 2011-11-03 16:51:53,502 INFO crawl.Crawl - crawl started in: crawl**** > > 2011-11-03 16:51:53,502 INFO crawl.Crawl - rootUrlDir = urls**** > > 2011-11-03 16:51:53,502 INFO crawl.Crawl - threads = 4**** > > 2011-11-03 16:51:53,502 INFO crawl.Crawl - depth = 5**** > > 2011-11-03 16:51:53,502 INFO crawl.Crawl - solrUrl=null**** > > 2011-11-03 16:51:53,502 INFO crawl.Crawl - topN = 10**** > > 2011-11-03 16:51:53,518 INFO crawl.Injector - Injector: starting at > 2011-11-03 16:51:53**** > > 2011-11-03 16:51:53,518 INFO crawl.Injector - Injector: crawlDb: > crawl/crawldb**** > > 2011-11-03 16:51:53,518 INFO crawl.Injector - Injector: urlDir: urls**** > > 2011-11-03 16:51:53,534 INFO crawl.Injector - Injector: Converting > injected urls to crawl db entries.**** > > 2011-11-03 16:51:53,658 WARN mapred.JobClient - No job jar file set. > User classes may not be found. See JobConf(Class) or JobConf#setJar(String). > **** > > 2011-11-03 16:51:54,267 INFO plugin.PluginRepository - Plugins: looking > in: E:\IdealTimes\WorkSpace\Nutch1.3\plugin**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Plugin > Auto-activation mode: [true]**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Registered Plugins: > **** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - the nutch core > extension points (nutch-extensionpoints)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Basic URL > Normalizer (urlnormalizer-basic)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Html Parse Plug-in > (parse-html)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Basic Indexing > Filter (index-basic)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - HTTP Framework > (lib-http)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Pass-through URL > Normalizer (urlnormalizer-pass)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Regex URL Filter > (urlfilter-regex)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Http Protocol > Plug-in (protocol-http)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Regex URL > Normalizer (urlnormalizer-regex)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Tika Parser > Plug-in (parse-tika)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - OPIC Scoring > Plug-in (scoring-opic)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - CyberNeko HTML > Parser (lib-nekohtml)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Anchor Indexing > Filter (index-anchor)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Regex URL Filter > Framework (lib-regex-filter)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Registered > Extension-Points:**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch URL > Normalizer (org.apache.nutch.net.URLNormalizer)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch Protocol > (org.apache.nutch.protocol.Protocol)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch Segment > Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch URL Filter > (org.apache.nutch.net.URLFilter)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch Indexing > Filter (org.apache.nutch.indexer.IndexingFilter)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - HTML Parse Filter > (org.apache.nutch.parse.HtmlParseFilter)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch Content > Parser (org.apache.nutch.parse.Parser)**** > > 2011-11-03 16:51:54,345 INFO plugin.PluginRepository - Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter)**** > > 2011-11-03 16:51:54,345 WARN net.URLNormalizers - > URLNormalizers:PluginRuntimeException when initializing url normalizer > plugin urlnormalizer-basic instance in getURLNormalizers function: > attempting to continue instantiating plugins**** > > 2011-11-03 16:51:54,360 WARN net.URLNormalizers - > URLNormalizers:PluginRuntimeException when initializing url normalizer > plugin urlnormalizer-regex instance in getURLNormalizers function: > attempting to continue instantiating plugins**** > > 2011-11-03 16:51:54,360 WARN net.URLNormalizers - > URLNormalizers:PluginRuntimeException when initializing url normalizer > plugin urlnormalizer-pass instance in getURLNormalizers function: > attempting to continue instantiating plugins**** > > 2011-11-03 16:51:54,360 WARN mapred.LocalJobRunner - job_local_0001**** > > java.lang.RuntimeException: Error in configuring object**** > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > **** > > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)*** > * > > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > **** > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) > **** > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)**** > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)** > ** > > Caused by: java.lang.reflect.InvocationTargetException**** > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*** > * > > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)*** > * > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > **** > > at java.lang.reflect.Method.invoke(Unknown Source)**** > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > **** > > ... 5 more**** > > Caused by: java.lang.RuntimeException: Error in configuring object**** > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > **** > > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)*** > * > > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > **** > > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > **** > > ... 10 more**** > > Caused by: java.lang.reflect.InvocationTargetException**** > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*** > * > > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)*** > * > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > **** > > at java.lang.reflect.Method.invoke(Unknown Source)**** > > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > **** > > ... 13 more**** > > Caused by: java.lang.RuntimeException: > org.apache.nutch.plugin.PluginRuntimeException: > java.lang.ClassNotFoundException: > org.apache.nutch.urlfilter.regex.RegexURLFilter**** > > at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:77)**** > > at > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:72)** > ** > > ... 18 more**** > > Caused by: org.apache.nutch.plugin.PluginRuntimeException: > java.lang.ClassNotFoundException: > org.apache.nutch.urlfilter.regex.RegexURLFilter**** > > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) > **** > > at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:57)**** > > ... 19 more**** > > Caused by: java.lang.ClassNotFoundException: > org.apache.nutch.urlfilter.regex.RegexURLFilter**** > > at java.net.URLClassLoader$1.run(Unknown Source)**** > > at java.net.URLClassLoader$1.run(Unknown Source)**** > > at java.security.AccessController.doPrivileged(Native Method)**** > > at java.net.URLClassLoader.findClass(Unknown Source)**** > > at java.lang.ClassLoader.loadClass(Unknown Source)**** > > at java.lang.ClassLoader.loadClass(Unknown Source)**** > > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) > **** > > ... 20 more**** > > ** ** > > ** ** > > Best withes !**** > > ** ** > > Skiming_zhang**** > -- *Lewis*

