Hi, 14/05/28 07:02:29 INFO mapred.JobClient: Task Id : attempt_201405280024_0016_r_000001_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
This error show us you have firewall or DNS problem. You can look at this link: http://stackoverflow.com/questions/10729543/shuffle-errorexceeded-max-failed-unique-matche-bailing-out Talat 2014-05-28 14:22 GMT+03:00 Manikandan Saravanan <[email protected]>: > Hi, I’m running Nutch 2 on a 2-node Hadoop cluster to do whole web crawling. > I’m seeding about 700 URLs from the DMOZ directory. About the same number is > being injected. The problem is that nothing is being generated after the > inject phase. Subsequently nothing is being indexed either. > > The trace of the entire crawl job is here: > > 14/05/28 06:54:23 INFO crawl.InjectorJob: InjectorJob: starting at 2014-05-28 > 06:54:23 > 14/05/28 06:54:23 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: > urls/seed.txt > 14/05/28 06:54:24 INFO crawl.InjectorJob: InjectorJob: Using class > org.apache.gora.memory.store.MemStore as the Gora storage class. > 14/05/28 06:54:25 INFO input.FileInputFormat: Total input paths to process : 1 > 14/05/28 06:54:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library > 14/05/28 06:54:25 WARN snappy.LoadSnappy: Snappy native library not loaded > 14/05/28 06:54:25 INFO mapred.JobClient: Running job: job_201405280024_0015 > 14/05/28 06:54:26 INFO mapred.JobClient: map 0% reduce 0% > 14/05/28 06:54:36 INFO mapred.JobClient: map 100% reduce 0% > 14/05/28 06:54:40 INFO mapred.JobClient: Job complete: job_201405280024_0015 > 14/05/28 06:54:40 INFO mapred.JobClient: Counters: 20 > 14/05/28 06:54:40 INFO mapred.JobClient: Job Counters > 14/05/28 06:54:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10927 > 14/05/28 06:54:40 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/05/28 06:54:40 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/05/28 06:54:40 INFO mapred.JobClient: Launched map tasks=1 > 14/05/28 06:54:40 INFO mapred.JobClient: Data-local map tasks=1 > 14/05/28 06:54:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 14/05/28 06:54:40 INFO mapred.JobClient: File Output Format Counters > 14/05/28 06:54:40 INFO mapred.JobClient: Bytes Written=0 > 14/05/28 06:54:40 INFO mapred.JobClient: injector > 14/05/28 06:54:40 INFO mapred.JobClient: urls_injected=765 > 14/05/28 06:54:40 INFO mapred.JobClient: urls_filtered=14 > 14/05/28 06:54:40 INFO mapred.JobClient: FileSystemCounters > 14/05/28 06:54:40 INFO mapred.JobClient: HDFS_BYTES_READ=26006 > 14/05/28 06:54:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=77762 > 14/05/28 06:54:40 INFO mapred.JobClient: File Input Format Counters > 14/05/28 06:54:40 INFO mapred.JobClient: Bytes Read=25896 > 14/05/28 06:54:40 INFO mapred.JobClient: Map-Reduce Framework > 14/05/28 06:54:40 INFO mapred.JobClient: Map input records=779 > 14/05/28 06:54:40 INFO mapred.JobClient: Physical memory (bytes) > snapshot=113258496 > 14/05/28 06:54:40 INFO mapred.JobClient: Spilled Records=0 > 14/05/28 06:54:40 INFO mapred.JobClient: CPU time spent (ms)=2530 > 14/05/28 06:54:40 INFO mapred.JobClient: Total committed heap usage > (bytes)=58195968 > 14/05/28 06:54:40 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=1118162944 > 14/05/28 06:54:40 INFO mapred.JobClient: Map output records=765 > 14/05/28 06:54:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=110 > 14/05/28 06:54:40 INFO crawl.InjectorJob: InjectorJob: total number of urls > rejected by filters: 14 > 14/05/28 06:54:40 INFO crawl.InjectorJob: InjectorJob: total number of urls > injected after normalization and filtering: 765 > 14/05/28 06:54:40 INFO crawl.InjectorJob: Injector: finished at 2014-05-28 > 06:54:40, elapsed: 00:00:16 > Wed May 28 06:54:40 EDT 2014 : Iteration 1 of 2 > Generating batchId > Generating a new fetchlist > Warning: $HADOOP_HOME is deprecated. > > 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: starting at > 2014-05-28 06:54:42 > 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: Selecting > best-scoring urls due for fetch. > 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: starting > 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: filtering: false > 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: normalizing: false > 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: topN: 50000 > 14/05/28 06:54:42 INFO crawl.FetchScheduleFactory: Using FetchSchedule impl: > org.apache.nutch.crawl.DefaultFetchSchedule > 14/05/28 06:54:42 INFO crawl.AbstractFetchSchedule: defaultInterval=2592000 > 14/05/28 06:54:42 INFO crawl.AbstractFetchSchedule: maxInterval=7776000 > 14/05/28 06:54:44 INFO mapred.JobClient: Running job: job_201405280024_0016 > 14/05/28 06:54:45 INFO mapred.JobClient: map 0% reduce 0% > 14/05/28 06:54:55 INFO mapred.JobClient: map 100% reduce 0% > 14/05/28 06:55:03 INFO mapred.JobClient: map 100% reduce 16% > 14/05/28 06:55:04 INFO mapred.JobClient: map 100% reduce 50% > 14/05/28 07:02:29 INFO mapred.JobClient: Task Id : > attempt_201405280024_0016_r_000001_0, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 14/05/28 07:02:39 INFO mapred.JobClient: map 100% reduce 66% > 14/05/28 07:02:40 INFO mapred.JobClient: map 100% reduce 100% > 14/05/28 07:02:43 INFO mapred.JobClient: Job complete: job_201405280024_0016 > 14/05/28 07:02:43 INFO mapred.JobClient: Counters: 27 > 14/05/28 07:02:43 INFO mapred.JobClient: Job Counters > 14/05/28 07:02:43 INFO mapred.JobClient: Launched reduce tasks=3 > 14/05/28 07:02:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11387 > 14/05/28 07:02:43 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Launched map tasks=1 > 14/05/28 07:02:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23048 > 14/05/28 07:02:43 INFO mapred.JobClient: File Output Format Counters > 14/05/28 07:02:43 INFO mapred.JobClient: Bytes Written=0 > 14/05/28 07:02:43 INFO mapred.JobClient: FileSystemCounters > 14/05/28 07:02:43 INFO mapred.JobClient: FILE_BYTES_READ=44 > 14/05/28 07:02:43 INFO mapred.JobClient: HDFS_BYTES_READ=833 > 14/05/28 07:02:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=239555 > 14/05/28 07:02:43 INFO mapred.JobClient: File Input Format Counters > 14/05/28 07:02:43 INFO mapred.JobClient: Bytes Read=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Map-Reduce Framework > 14/05/28 07:02:43 INFO mapred.JobClient: Map output materialized bytes=28 > 14/05/28 07:02:43 INFO mapred.JobClient: Map input records=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Reduce shuffle bytes=28 > 14/05/28 07:02:43 INFO mapred.JobClient: Spilled Records=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Map output bytes=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Total committed heap usage > (bytes)=277872640 > 14/05/28 07:02:43 INFO mapred.JobClient: CPU time spent (ms)=4130 > 14/05/28 07:02:43 INFO mapred.JobClient: Combine input records=0 > 14/05/28 07:02:43 INFO mapred.JobClient: SPLIT_RAW_BYTES=833 > 14/05/28 07:02:43 INFO mapred.JobClient: Reduce input records=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Reduce input groups=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Combine output records=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Physical memory (bytes) > snapshot=422510592 > 14/05/28 07:02:43 INFO mapred.JobClient: Reduce output records=0 > 14/05/28 07:02:43 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=5982715904 > 14/05/28 07:02:43 INFO mapred.JobClient: Map output records=0 > 14/05/28 07:02:43 INFO crawl.GeneratorJob: GeneratorJob: finished at > 2014-05-28 07:02:43, time elapsed: 00:08:00 > 14/05/28 07:02:43 INFO crawl.GeneratorJob: GeneratorJob: generated batch id: > 1401274480-22738 > Fetching : > Warning: $HADOOP_HOME is deprecated. > > 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: starting > 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: batchId: > 1401274480-22738 > 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: threads: 50 > 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: parsing: false > 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: resuming: false > 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob : timelimit set for : > 1401285765716 > 14/05/28 07:02:46 INFO plugin.PluginRepository: Plugins: looking in: > /app/hadoop/tmp/hadoop-unjar110933996696870181/classes/plugins > 14/05/28 07:02:46 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 14/05/28 07:02:46 INFO plugin.PluginRepository: Registered Plugins: > 14/05/28 07:02:46 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Http / Https Protocol > Plug-in (protocol-httpclient) > 14/05/28 07:02:46 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Creative Commons > Plugins (creativecommons) > 14/05/28 07:02:46 INFO plugin.PluginRepository: More Indexing Filter > (index-more) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 14/05/28 07:02:46 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 14/05/28 07:02:46 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 14/05/28 07:02:46 INFO plugin.PluginRepository: JavaScript Parser > (parse-js) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Registered Extension-Points: > 14/05/28 07:02:46 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Parse Filter > (org.apache.nutch.parse.ParseFilter) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 14/05/28 07:02:46 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.host = null > 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.port = 8080 > 14/05/28 07:02:46 INFO httpclient.Http: http.timeout = 10000 > 14/05/28 07:02:46 INFO httpclient.Http: http.content.limit = 65536 > 14/05/28 07:02:46 INFO httpclient.Http: http.agent = Qontifi/Nutch-2.2.1 (A > big data analytics and social media intelligence platform; > http://qontifi.com; manikandan at thesocialpeople dot net) > 14/05/28 07:02:46 INFO httpclient.Http: http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 > 14/05/28 07:02:46 INFO httpclient.Http: http.accept = > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 14/05/28 07:02:46 INFO conf.Configuration: found resource httpclient-auth.xml > at file:/app/hadoop/tmp/hadoop-unjar110933996696870181/httpclient-auth.xml > 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.host = null > 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.port = 8080 > 14/05/28 07:02:46 INFO httpclient.Http: http.timeout = 10000 > 14/05/28 07:02:46 INFO httpclient.Http: http.content.limit = 65536 > 14/05/28 07:02:46 INFO httpclient.Http: http.agent = Qontifi/Nutch-2.2.1 (A > big data analytics and social media intelligence platform; > http://qontifi.com; manikandan at thesocialpeople dot net) > 14/05/28 07:02:46 INFO httpclient.Http: http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 > 14/05/28 07:02:46 INFO httpclient.Http: http.accept = > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 14/05/28 07:02:49 INFO mapred.JobClient: Running job: job_201405280024_0017 > 14/05/28 07:02:50 INFO mapred.JobClient: map 0% reduce 0% > 14/05/28 07:03:01 INFO mapred.JobClient: map 100% reduce 0% > 14/05/28 07:03:10 INFO mapred.JobClient: map 100% reduce 16% > 14/05/28 07:03:13 INFO mapred.JobClient: map 100% reduce 50% > 14/05/28 07:10:34 INFO mapred.JobClient: Task Id : > attempt_201405280024_0017_r_000001_0, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 14/05/28 07:10:44 INFO mapred.JobClient: map 100% reduce 66% > 14/05/28 07:10:47 INFO mapred.JobClient: map 100% reduce 100% > 14/05/28 07:10:54 INFO mapred.JobClient: Job complete: job_201405280024_0017 > 14/05/28 07:10:54 INFO mapred.JobClient: Counters: 28 > 14/05/28 07:10:54 INFO mapred.JobClient: Job Counters > 14/05/28 07:10:54 INFO mapred.JobClient: Launched reduce tasks=3 > 14/05/28 07:10:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11752 > 14/05/28 07:10:54 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Launched map tasks=1 > 14/05/28 07:10:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=33613 > 14/05/28 07:10:54 INFO mapred.JobClient: File Output Format Counters > 14/05/28 07:10:54 INFO mapred.JobClient: Bytes Written=0 > 14/05/28 07:10:54 INFO mapred.JobClient: FileSystemCounters > 14/05/28 07:10:54 INFO mapred.JobClient: FILE_BYTES_READ=44 > 14/05/28 07:10:54 INFO mapred.JobClient: HDFS_BYTES_READ=817 > 14/05/28 07:10:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=238025 > 14/05/28 07:10:54 INFO mapred.JobClient: File Input Format Counters > 14/05/28 07:10:54 INFO mapred.JobClient: Bytes Read=0 > 14/05/28 07:10:54 INFO mapred.JobClient: FetcherStatus > 14/05/28 07:10:54 INFO mapred.JobClient: HitByTimeLimit-QueueFeeder=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Map-Reduce Framework > 14/05/28 07:10:54 INFO mapred.JobClient: Map output materialized bytes=28 > 14/05/28 07:10:54 INFO mapred.JobClient: Map input records=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Reduce shuffle bytes=28 > 14/05/28 07:10:54 INFO mapred.JobClient: Spilled Records=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Map output bytes=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Total committed heap usage > (bytes)=317194240 > 14/05/28 07:10:54 INFO mapred.JobClient: CPU time spent (ms)=6460 > 14/05/28 07:10:54 INFO mapred.JobClient: Combine input records=0 > 14/05/28 07:10:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=817 > 14/05/28 07:10:54 INFO mapred.JobClient: Reduce input records=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Reduce input groups=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Combine output records=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Physical memory (bytes) > snapshot=444006400 > 14/05/28 07:10:54 INFO mapred.JobClient: Reduce output records=0 > 14/05/28 07:10:54 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=6052544512 > 14/05/28 07:10:54 INFO mapred.JobClient: Map output records=0 > 14/05/28 07:10:54 INFO fetcher.FetcherJob: FetcherJob: done > Parsing : > Warning: $HADOOP_HOME is deprecated. > > 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: starting > 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: resuming: false > 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: forced reparse: false > 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: batchId: > 1401274480-22738 > 14/05/28 07:10:57 INFO plugin.PluginRepository: Plugins: looking in: > /app/hadoop/tmp/hadoop-unjar1161270060222812225/classes/plugins > 14/05/28 07:10:57 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 14/05/28 07:10:57 INFO plugin.PluginRepository: Registered Plugins: > 14/05/28 07:10:57 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Http / Https Protocol > Plug-in (protocol-httpclient) > 14/05/28 07:10:57 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Creative Commons > Plugins (creativecommons) > 14/05/28 07:10:57 INFO plugin.PluginRepository: More Indexing Filter > (index-more) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 14/05/28 07:10:57 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 14/05/28 07:10:57 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 14/05/28 07:10:57 INFO plugin.PluginRepository: JavaScript Parser > (parse-js) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Registered Extension-Points: > 14/05/28 07:10:57 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Parse Filter > (org.apache.nutch.parse.ParseFilter) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 14/05/28 07:10:57 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 14/05/28 07:10:57 INFO conf.Configuration: found resource parse-plugins.xml > at file:/app/hadoop/tmp/hadoop-unjar1161270060222812225/parse-plugins.xml > 14/05/28 07:10:57 INFO crawl.SignatureFactory: Using Signature impl: > org.apache.nutch.crawl.MD5Signature > 14/05/28 07:10:59 INFO mapred.JobClient: Running job: job_201405280024_0018 > 14/05/28 07:11:00 INFO mapred.JobClient: map 0% reduce 0% > 14/05/28 07:11:07 INFO mapred.JobClient: map 100% reduce 0% > 14/05/28 07:11:09 INFO mapred.JobClient: Job complete: job_201405280024_0018 > 14/05/28 07:11:09 INFO mapred.JobClient: Counters: 17 > 14/05/28 07:11:09 INFO mapred.JobClient: Job Counters > 14/05/28 07:11:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7869 > 14/05/28 07:11:09 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/05/28 07:11:09 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/05/28 07:11:09 INFO mapred.JobClient: Launched map tasks=1 > 14/05/28 07:11:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 14/05/28 07:11:09 INFO mapred.JobClient: File Output Format Counters > 14/05/28 07:11:09 INFO mapred.JobClient: Bytes Written=0 > 14/05/28 07:11:09 INFO mapred.JobClient: FileSystemCounters > 14/05/28 07:11:09 INFO mapred.JobClient: HDFS_BYTES_READ=861 > 14/05/28 07:11:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=78891 > 14/05/28 07:11:09 INFO mapred.JobClient: File Input Format Counters > 14/05/28 07:11:09 INFO mapred.JobClient: Bytes Read=0 > 14/05/28 07:11:09 INFO mapred.JobClient: Map-Reduce Framework > 14/05/28 07:11:09 INFO mapred.JobClient: Map input records=0 > 14/05/28 07:11:09 INFO mapred.JobClient: Physical memory (bytes) > snapshot=114253824 > 14/05/28 07:11:09 INFO mapred.JobClient: Spilled Records=0 > 14/05/28 07:11:09 INFO mapred.JobClient: CPU time spent (ms)=1070 > 14/05/28 07:11:09 INFO mapred.JobClient: Total committed heap usage > (bytes)=58195968 > 14/05/28 07:11:09 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=1987776512 > 14/05/28 07:11:09 INFO mapred.JobClient: Map output records=0 > 14/05/28 07:11:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=861 > 14/05/28 07:11:09 INFO parse.ParserJob: ParserJob: success > CrawlDB update for TestCrawl > Warning: $HADOOP_HOME is deprecated. > > 14/05/28 07:11:12 INFO crawl.DbUpdaterJob: DbUpdaterJob: starting > 14/05/28 07:11:13 INFO plugin.PluginRepository: Plugins: looking in: > /app/hadoop/tmp/hadoop-unjar5400634919722418143/classes/plugins > 14/05/28 07:11:13 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 14/05/28 07:11:13 INFO plugin.PluginRepository: Registered Plugins: > 14/05/28 07:11:13 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Http / Https Protocol > Plug-in (protocol-httpclient) > 14/05/28 07:11:13 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Creative Commons > Plugins (creativecommons) > 14/05/28 07:11:13 INFO plugin.PluginRepository: More Indexing Filter > (index-more) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 14/05/28 07:11:13 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 14/05/28 07:11:13 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 14/05/28 07:11:13 INFO plugin.PluginRepository: JavaScript Parser > (parse-js) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Registered Extension-Points: > 14/05/28 07:11:13 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Parse Filter > (org.apache.nutch.parse.ParseFilter) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 14/05/28 07:11:13 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 14/05/28 07:11:16 INFO mapred.JobClient: Running job: job_201405280024_0019 > 14/05/28 07:11:17 INFO mapred.JobClient: map 0% reduce 0% > 14/05/28 07:11:28 INFO mapred.JobClient: map 100% reduce 0% > 14/05/28 07:11:38 INFO mapred.JobClient: map 100% reduce 16% > 14/05/28 07:11:39 INFO mapred.JobClient: map 100% reduce 50% > 14/05/28 07:19:00 INFO mapred.JobClient: Task Id : > attempt_201405280024_0019_r_000001_0, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 14/05/28 07:19:00 WARN mapred.JobClient: Error reading task > outputnutch-two-qontifi > 14/05/28 07:19:00 WARN mapred.JobClient: Error reading task > outputnutch-two-qontifi > 14/05/28 07:19:11 INFO mapred.JobClient: map 100% reduce 66% > 14/05/28 07:19:12 INFO mapred.JobClient: map 100% reduce 100% > 14/05/28 07:19:13 INFO mapred.JobClient: Job complete: job_201405280024_0019 > 14/05/28 07:19:13 INFO mapred.JobClient: Counters: 27 > 14/05/28 07:19:13 INFO mapred.JobClient: Job Counters > 14/05/28 07:19:13 INFO mapred.JobClient: Launched reduce tasks=3 > 14/05/28 07:19:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10614 > 14/05/28 07:19:13 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Launched map tasks=1 > 14/05/28 07:19:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23263 > 14/05/28 07:19:13 INFO mapred.JobClient: File Output Format Counters > 14/05/28 07:19:13 INFO mapred.JobClient: Bytes Written=0 > 14/05/28 07:19:13 INFO mapred.JobClient: FileSystemCounters > 14/05/28 07:19:13 INFO mapred.JobClient: FILE_BYTES_READ=44 > 14/05/28 07:19:13 INFO mapred.JobClient: HDFS_BYTES_READ=910 > 14/05/28 07:19:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=238016 > 14/05/28 07:19:13 INFO mapred.JobClient: File Input Format Counters > 14/05/28 07:19:13 INFO mapred.JobClient: Bytes Read=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Map-Reduce Framework > 14/05/28 07:19:13 INFO mapred.JobClient: Map output materialized bytes=28 > 14/05/28 07:19:13 INFO mapred.JobClient: Map input records=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Reduce shuffle bytes=28 > 14/05/28 07:19:13 INFO mapred.JobClient: Spilled Records=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Map output bytes=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Total committed heap usage > (bytes)=293601280 > 14/05/28 07:19:13 INFO mapred.JobClient: CPU time spent (ms)=6540 > 14/05/28 07:19:13 INFO mapred.JobClient: Combine input records=0 > 14/05/28 07:19:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=910 > 14/05/28 07:19:13 INFO mapred.JobClient: Reduce input records=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Reduce input groups=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Combine output records=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Physical memory (bytes) > snapshot=470159360 > 14/05/28 07:19:13 INFO mapred.JobClient: Reduce output records=0 > 14/05/28 07:19:13 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=5987823616 > 14/05/28 07:19:13 INFO mapred.JobClient: Map output records=0 > 14/05/28 07:19:13 INFO crawl.DbUpdaterJob: DbUpdaterJob: done > Indexing TestCrawl on SOLR index -> http://128.199.207.54:8983/solr/nutch > Warning: $HADOOP_HOME is deprecated. > > 14/05/28 07:19:16 INFO solr.SolrIndexerJob: SolrIndexerJob: starting > 14/05/28 07:19:16 INFO plugin.PluginRepository: Plugins: looking in: > /app/hadoop/tmp/hadoop-unjar5241938989393377870/classes/plugins > 14/05/28 07:19:16 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 14/05/28 07:19:16 INFO plugin.PluginRepository: Registered Plugins: > 14/05/28 07:19:16 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Http / Https Protocol > Plug-in (protocol-httpclient) > 14/05/28 07:19:16 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Creative Commons > Plugins (creativecommons) > 14/05/28 07:19:16 INFO plugin.PluginRepository: More Indexing Filter > (index-more) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 14/05/28 07:19:16 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 14/05/28 07:19:16 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 14/05/28 07:19:16 INFO plugin.PluginRepository: JavaScript Parser > (parse-js) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Registered Extension-Points: > 14/05/28 07:19:16 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Parse Filter > (org.apache.nutch.parse.ParseFilter) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 14/05/28 07:19:16 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 14/05/28 07:19:16 INFO basic.BasicIndexingFilter: Maximum title length for > indexing set to: 100 > 14/05/28 07:19:16 INFO indexer.IndexingFilters: Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 14/05/28 07:19:16 INFO indexer.IndexingFilters: Adding > org.creativecommons.nutch.CCIndexingFilter > 14/05/28 07:19:17 INFO indexer.IndexingFilters: Adding > org.apache.nutch.indexer.more.MoreIndexingFilter > 14/05/28 07:19:21 INFO mapred.JobClient: Running job: job_201405280024_0020 > 14/05/28 07:19:22 INFO mapred.JobClient: map 0% reduce 0% > 14/05/28 07:19:31 INFO mapred.JobClient: map 100% reduce 0% > 14/05/28 07:19:33 INFO mapred.JobClient: Job complete: job_201405280024_0020 > 14/05/28 07:19:33 INFO mapred.JobClient: Counters: 17 > 14/05/28 07:19:33 INFO mapred.JobClient: Job Counters > 14/05/28 07:19:33 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9290 > 14/05/28 07:19:33 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/05/28 07:19:33 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/05/28 07:19:33 INFO mapred.JobClient: Launched map tasks=1 > 14/05/28 07:19:33 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 14/05/28 07:19:33 INFO mapred.JobClient: File Output Format Counters > 14/05/28 07:19:33 INFO mapred.JobClient: Bytes Written=0 > 14/05/28 07:19:33 INFO mapred.JobClient: FileSystemCounters > 14/05/28 07:19:33 INFO mapred.JobClient: HDFS_BYTES_READ=877 > 14/05/28 07:19:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=79006 > 14/05/28 07:19:33 INFO mapred.JobClient: File Input Format Counters > 14/05/28 07:19:33 INFO mapred.JobClient: Bytes Read=0 > 14/05/28 07:19:33 INFO mapred.JobClient: Map-Reduce Framework > 14/05/28 07:19:33 INFO mapred.JobClient: Map input records=0 > 14/05/28 07:19:33 INFO mapred.JobClient: Physical memory (bytes) > snapshot=117587968 > 14/05/28 07:19:33 INFO mapred.JobClient: Spilled Records=0 > 14/05/28 07:19:33 INFO mapred.JobClient: CPU time spent (ms)=1040 > 14/05/28 07:19:33 INFO mapred.JobClient: Total committed heap usage > (bytes)=59768832 > 14/05/28 07:19:33 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=1992785920 > 14/05/28 07:19:33 INFO mapred.JobClient: Map output records=0 > 14/05/28 07:19:33 INFO mapred.JobClient: SPLIT_RAW_BYTES=877 > 14/05/28 07:19:33 INFO solr.SolrIndexerJob: SolrIndexerJob: done. > > Am I missing anything? > > -- > Manikandan Saravanan > Architect - Technology > TheSocialPeople -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

