Hi,

14/05/28 07:02:29 INFO mapred.JobClient: Task Id :
attempt_201405280024_0016_r_000001_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

This error show us you have firewall or DNS problem. You can look at
this link: 
http://stackoverflow.com/questions/10729543/shuffle-errorexceeded-max-failed-unique-matche-bailing-out

Talat

2014-05-28 14:22 GMT+03:00 Manikandan Saravanan
<[email protected]>:
> Hi, I’m running Nutch 2 on a 2-node Hadoop cluster to do whole web crawling. 
> I’m seeding about 700 URLs from the DMOZ directory. About the same number is 
> being injected. The problem is that nothing is being generated after the 
> inject phase. Subsequently nothing is being indexed either.
>
> The trace of the entire crawl job is here:
>
> 14/05/28 06:54:23 INFO crawl.InjectorJob: InjectorJob: starting at 2014-05-28 
> 06:54:23
> 14/05/28 06:54:23 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: 
> urls/seed.txt
> 14/05/28 06:54:24 INFO crawl.InjectorJob: InjectorJob: Using class 
> org.apache.gora.memory.store.MemStore as the Gora storage class.
> 14/05/28 06:54:25 INFO input.FileInputFormat: Total input paths to process : 1
> 14/05/28 06:54:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 14/05/28 06:54:25 WARN snappy.LoadSnappy: Snappy native library not loaded
> 14/05/28 06:54:25 INFO mapred.JobClient: Running job: job_201405280024_0015
> 14/05/28 06:54:26 INFO mapred.JobClient:  map 0% reduce 0%
> 14/05/28 06:54:36 INFO mapred.JobClient:  map 100% reduce 0%
> 14/05/28 06:54:40 INFO mapred.JobClient: Job complete: job_201405280024_0015
> 14/05/28 06:54:40 INFO mapred.JobClient: Counters: 20
> 14/05/28 06:54:40 INFO mapred.JobClient:   Job Counters
> 14/05/28 06:54:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10927
> 14/05/28 06:54:40 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/28 06:54:40 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/28 06:54:40 INFO mapred.JobClient:     Launched map tasks=1
> 14/05/28 06:54:40 INFO mapred.JobClient:     Data-local map tasks=1
> 14/05/28 06:54:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/05/28 06:54:40 INFO mapred.JobClient:   File Output Format Counters
> 14/05/28 06:54:40 INFO mapred.JobClient:     Bytes Written=0
> 14/05/28 06:54:40 INFO mapred.JobClient:   injector
> 14/05/28 06:54:40 INFO mapred.JobClient:     urls_injected=765
> 14/05/28 06:54:40 INFO mapred.JobClient:     urls_filtered=14
> 14/05/28 06:54:40 INFO mapred.JobClient:   FileSystemCounters
> 14/05/28 06:54:40 INFO mapred.JobClient:     HDFS_BYTES_READ=26006
> 14/05/28 06:54:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=77762
> 14/05/28 06:54:40 INFO mapred.JobClient:   File Input Format Counters
> 14/05/28 06:54:40 INFO mapred.JobClient:     Bytes Read=25896
> 14/05/28 06:54:40 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/28 06:54:40 INFO mapred.JobClient:     Map input records=779
> 14/05/28 06:54:40 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=113258496
> 14/05/28 06:54:40 INFO mapred.JobClient:     Spilled Records=0
> 14/05/28 06:54:40 INFO mapred.JobClient:     CPU time spent (ms)=2530
> 14/05/28 06:54:40 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=58195968
> 14/05/28 06:54:40 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=1118162944
> 14/05/28 06:54:40 INFO mapred.JobClient:     Map output records=765
> 14/05/28 06:54:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=110
> 14/05/28 06:54:40 INFO crawl.InjectorJob: InjectorJob: total number of urls 
> rejected by filters: 14
> 14/05/28 06:54:40 INFO crawl.InjectorJob: InjectorJob: total number of urls 
> injected after normalization and filtering: 765
> 14/05/28 06:54:40 INFO crawl.InjectorJob: Injector: finished at 2014-05-28 
> 06:54:40, elapsed: 00:00:16
> Wed May 28 06:54:40 EDT 2014 : Iteration 1 of 2
> Generating batchId
> Generating a new fetchlist
> Warning: $HADOOP_HOME is deprecated.
>
> 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: starting at 
> 2014-05-28 06:54:42
> 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: Selecting 
> best-scoring urls due for fetch.
> 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: starting
> 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: filtering: false
> 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: normalizing: false
> 14/05/28 06:54:42 INFO crawl.GeneratorJob: GeneratorJob: topN: 50000
> 14/05/28 06:54:42 INFO crawl.FetchScheduleFactory: Using FetchSchedule impl: 
> org.apache.nutch.crawl.DefaultFetchSchedule
> 14/05/28 06:54:42 INFO crawl.AbstractFetchSchedule: defaultInterval=2592000
> 14/05/28 06:54:42 INFO crawl.AbstractFetchSchedule: maxInterval=7776000
> 14/05/28 06:54:44 INFO mapred.JobClient: Running job: job_201405280024_0016
> 14/05/28 06:54:45 INFO mapred.JobClient:  map 0% reduce 0%
> 14/05/28 06:54:55 INFO mapred.JobClient:  map 100% reduce 0%
> 14/05/28 06:55:03 INFO mapred.JobClient:  map 100% reduce 16%
> 14/05/28 06:55:04 INFO mapred.JobClient:  map 100% reduce 50%
> 14/05/28 07:02:29 INFO mapred.JobClient: Task Id : 
> attempt_201405280024_0016_r_000001_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/05/28 07:02:39 INFO mapred.JobClient:  map 100% reduce 66%
> 14/05/28 07:02:40 INFO mapred.JobClient:  map 100% reduce 100%
> 14/05/28 07:02:43 INFO mapred.JobClient: Job complete: job_201405280024_0016
> 14/05/28 07:02:43 INFO mapred.JobClient: Counters: 27
> 14/05/28 07:02:43 INFO mapred.JobClient:   Job Counters
> 14/05/28 07:02:43 INFO mapred.JobClient:     Launched reduce tasks=3
> 14/05/28 07:02:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11387
> 14/05/28 07:02:43 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Launched map tasks=1
> 14/05/28 07:02:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=23048
> 14/05/28 07:02:43 INFO mapred.JobClient:   File Output Format Counters
> 14/05/28 07:02:43 INFO mapred.JobClient:     Bytes Written=0
> 14/05/28 07:02:43 INFO mapred.JobClient:   FileSystemCounters
> 14/05/28 07:02:43 INFO mapred.JobClient:     FILE_BYTES_READ=44
> 14/05/28 07:02:43 INFO mapred.JobClient:     HDFS_BYTES_READ=833
> 14/05/28 07:02:43 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=239555
> 14/05/28 07:02:43 INFO mapred.JobClient:   File Input Format Counters
> 14/05/28 07:02:43 INFO mapred.JobClient:     Bytes Read=0
> 14/05/28 07:02:43 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/28 07:02:43 INFO mapred.JobClient:     Map output materialized bytes=28
> 14/05/28 07:02:43 INFO mapred.JobClient:     Map input records=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Reduce shuffle bytes=28
> 14/05/28 07:02:43 INFO mapred.JobClient:     Spilled Records=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Map output bytes=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=277872640
> 14/05/28 07:02:43 INFO mapred.JobClient:     CPU time spent (ms)=4130
> 14/05/28 07:02:43 INFO mapred.JobClient:     Combine input records=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     SPLIT_RAW_BYTES=833
> 14/05/28 07:02:43 INFO mapred.JobClient:     Reduce input records=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Reduce input groups=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Combine output records=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=422510592
> 14/05/28 07:02:43 INFO mapred.JobClient:     Reduce output records=0
> 14/05/28 07:02:43 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=5982715904
> 14/05/28 07:02:43 INFO mapred.JobClient:     Map output records=0
> 14/05/28 07:02:43 INFO crawl.GeneratorJob: GeneratorJob: finished at 
> 2014-05-28 07:02:43, time elapsed: 00:08:00
> 14/05/28 07:02:43 INFO crawl.GeneratorJob: GeneratorJob: generated batch id: 
> 1401274480-22738
> Fetching :
> Warning: $HADOOP_HOME is deprecated.
>
> 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: starting
> 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: batchId: 
> 1401274480-22738
> 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: threads: 50
> 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: parsing: false
> 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob: resuming: false
> 14/05/28 07:02:45 INFO fetcher.FetcherJob: FetcherJob : timelimit set for : 
> 1401285765716
> 14/05/28 07:02:46 INFO plugin.PluginRepository: Plugins: looking in: 
> /app/hadoop/tmp/hadoop-unjar110933996696870181/classes/plugins
> 14/05/28 07:02:46 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 14/05/28 07:02:46 INFO plugin.PluginRepository: Registered Plugins:
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         the nutch core 
> extension points (nutch-extensionpoints)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Basic URL Normalizer 
> (urlnormalizer-basic)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Html Parse Plug-in 
> (parse-html)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Basic Indexing Filter 
> (index-basic)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Http / Https Protocol 
> Plug-in (protocol-httpclient)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         HTTP Framework 
> (lib-http)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Creative Commons 
> Plugins (creativecommons)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         More Indexing Filter 
> (index-more)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Regex URL Filter 
> (urlfilter-regex)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Regex URL Normalizer 
> (urlnormalizer-regex)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
> (scoring-opic)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
> (lib-nekohtml)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         JavaScript Parser 
> (parse-js)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Regex URL Filter 
> Framework (lib-regex-filter)
> 14/05/28 07:02:46 INFO plugin.PluginRepository: Registered Extension-Points:
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Parse Filter 
> (org.apache.nutch.parse.ParseFilter)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 14/05/28 07:02:46 INFO plugin.PluginRepository:         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.host = null
> 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.port = 8080
> 14/05/28 07:02:46 INFO httpclient.Http: http.timeout = 10000
> 14/05/28 07:02:46 INFO httpclient.Http: http.content.limit = 65536
> 14/05/28 07:02:46 INFO httpclient.Http: http.agent = Qontifi/Nutch-2.2.1 (A 
> big data analytics and social media intelligence platform; 
> http://qontifi.com; manikandan at thesocialpeople dot net)
> 14/05/28 07:02:46 INFO httpclient.Http: http.accept.language = 
> en-us,en-gb,en;q=0.7,*;q=0.3
> 14/05/28 07:02:46 INFO httpclient.Http: http.accept = 
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 14/05/28 07:02:46 INFO conf.Configuration: found resource httpclient-auth.xml 
> at file:/app/hadoop/tmp/hadoop-unjar110933996696870181/httpclient-auth.xml
> 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.host = null
> 14/05/28 07:02:46 INFO httpclient.Http: http.proxy.port = 8080
> 14/05/28 07:02:46 INFO httpclient.Http: http.timeout = 10000
> 14/05/28 07:02:46 INFO httpclient.Http: http.content.limit = 65536
> 14/05/28 07:02:46 INFO httpclient.Http: http.agent = Qontifi/Nutch-2.2.1 (A 
> big data analytics and social media intelligence platform; 
> http://qontifi.com; manikandan at thesocialpeople dot net)
> 14/05/28 07:02:46 INFO httpclient.Http: http.accept.language = 
> en-us,en-gb,en;q=0.7,*;q=0.3
> 14/05/28 07:02:46 INFO httpclient.Http: http.accept = 
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 14/05/28 07:02:49 INFO mapred.JobClient: Running job: job_201405280024_0017
> 14/05/28 07:02:50 INFO mapred.JobClient:  map 0% reduce 0%
> 14/05/28 07:03:01 INFO mapred.JobClient:  map 100% reduce 0%
> 14/05/28 07:03:10 INFO mapred.JobClient:  map 100% reduce 16%
> 14/05/28 07:03:13 INFO mapred.JobClient:  map 100% reduce 50%
> 14/05/28 07:10:34 INFO mapred.JobClient: Task Id : 
> attempt_201405280024_0017_r_000001_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/05/28 07:10:44 INFO mapred.JobClient:  map 100% reduce 66%
> 14/05/28 07:10:47 INFO mapred.JobClient:  map 100% reduce 100%
> 14/05/28 07:10:54 INFO mapred.JobClient: Job complete: job_201405280024_0017
> 14/05/28 07:10:54 INFO mapred.JobClient: Counters: 28
> 14/05/28 07:10:54 INFO mapred.JobClient:   Job Counters
> 14/05/28 07:10:54 INFO mapred.JobClient:     Launched reduce tasks=3
> 14/05/28 07:10:54 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11752
> 14/05/28 07:10:54 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Launched map tasks=1
> 14/05/28 07:10:54 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=33613
> 14/05/28 07:10:54 INFO mapred.JobClient:   File Output Format Counters
> 14/05/28 07:10:54 INFO mapred.JobClient:     Bytes Written=0
> 14/05/28 07:10:54 INFO mapred.JobClient:   FileSystemCounters
> 14/05/28 07:10:54 INFO mapred.JobClient:     FILE_BYTES_READ=44
> 14/05/28 07:10:54 INFO mapred.JobClient:     HDFS_BYTES_READ=817
> 14/05/28 07:10:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=238025
> 14/05/28 07:10:54 INFO mapred.JobClient:   File Input Format Counters
> 14/05/28 07:10:54 INFO mapred.JobClient:     Bytes Read=0
> 14/05/28 07:10:54 INFO mapred.JobClient:   FetcherStatus
> 14/05/28 07:10:54 INFO mapred.JobClient:     HitByTimeLimit-QueueFeeder=0
> 14/05/28 07:10:54 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/28 07:10:54 INFO mapred.JobClient:     Map output materialized bytes=28
> 14/05/28 07:10:54 INFO mapred.JobClient:     Map input records=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Reduce shuffle bytes=28
> 14/05/28 07:10:54 INFO mapred.JobClient:     Spilled Records=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Map output bytes=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=317194240
> 14/05/28 07:10:54 INFO mapred.JobClient:     CPU time spent (ms)=6460
> 14/05/28 07:10:54 INFO mapred.JobClient:     Combine input records=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=817
> 14/05/28 07:10:54 INFO mapred.JobClient:     Reduce input records=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Reduce input groups=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Combine output records=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=444006400
> 14/05/28 07:10:54 INFO mapred.JobClient:     Reduce output records=0
> 14/05/28 07:10:54 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=6052544512
> 14/05/28 07:10:54 INFO mapred.JobClient:     Map output records=0
> 14/05/28 07:10:54 INFO fetcher.FetcherJob: FetcherJob: done
> Parsing :
> Warning: $HADOOP_HOME is deprecated.
>
> 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: starting
> 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: resuming:    false
> 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: forced reparse:      false
> 14/05/28 07:10:56 INFO parse.ParserJob: ParserJob: batchId:     
> 1401274480-22738
> 14/05/28 07:10:57 INFO plugin.PluginRepository: Plugins: looking in: 
> /app/hadoop/tmp/hadoop-unjar1161270060222812225/classes/plugins
> 14/05/28 07:10:57 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 14/05/28 07:10:57 INFO plugin.PluginRepository: Registered Plugins:
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         the nutch core 
> extension points (nutch-extensionpoints)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Basic URL Normalizer 
> (urlnormalizer-basic)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Html Parse Plug-in 
> (parse-html)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Basic Indexing Filter 
> (index-basic)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Http / Https Protocol 
> Plug-in (protocol-httpclient)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         HTTP Framework 
> (lib-http)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Creative Commons 
> Plugins (creativecommons)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         More Indexing Filter 
> (index-more)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Regex URL Filter 
> (urlfilter-regex)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Regex URL Normalizer 
> (urlnormalizer-regex)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
> (scoring-opic)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
> (lib-nekohtml)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         JavaScript Parser 
> (parse-js)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Regex URL Filter 
> Framework (lib-regex-filter)
> 14/05/28 07:10:57 INFO plugin.PluginRepository: Registered Extension-Points:
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Parse Filter 
> (org.apache.nutch.parse.ParseFilter)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 14/05/28 07:10:57 INFO plugin.PluginRepository:         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 14/05/28 07:10:57 INFO conf.Configuration: found resource parse-plugins.xml 
> at file:/app/hadoop/tmp/hadoop-unjar1161270060222812225/parse-plugins.xml
> 14/05/28 07:10:57 INFO crawl.SignatureFactory: Using Signature impl: 
> org.apache.nutch.crawl.MD5Signature
> 14/05/28 07:10:59 INFO mapred.JobClient: Running job: job_201405280024_0018
> 14/05/28 07:11:00 INFO mapred.JobClient:  map 0% reduce 0%
> 14/05/28 07:11:07 INFO mapred.JobClient:  map 100% reduce 0%
> 14/05/28 07:11:09 INFO mapred.JobClient: Job complete: job_201405280024_0018
> 14/05/28 07:11:09 INFO mapred.JobClient: Counters: 17
> 14/05/28 07:11:09 INFO mapred.JobClient:   Job Counters
> 14/05/28 07:11:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7869
> 14/05/28 07:11:09 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/28 07:11:09 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/28 07:11:09 INFO mapred.JobClient:     Launched map tasks=1
> 14/05/28 07:11:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/05/28 07:11:09 INFO mapred.JobClient:   File Output Format Counters
> 14/05/28 07:11:09 INFO mapred.JobClient:     Bytes Written=0
> 14/05/28 07:11:09 INFO mapred.JobClient:   FileSystemCounters
> 14/05/28 07:11:09 INFO mapred.JobClient:     HDFS_BYTES_READ=861
> 14/05/28 07:11:09 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=78891
> 14/05/28 07:11:09 INFO mapred.JobClient:   File Input Format Counters
> 14/05/28 07:11:09 INFO mapred.JobClient:     Bytes Read=0
> 14/05/28 07:11:09 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/28 07:11:09 INFO mapred.JobClient:     Map input records=0
> 14/05/28 07:11:09 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=114253824
> 14/05/28 07:11:09 INFO mapred.JobClient:     Spilled Records=0
> 14/05/28 07:11:09 INFO mapred.JobClient:     CPU time spent (ms)=1070
> 14/05/28 07:11:09 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=58195968
> 14/05/28 07:11:09 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=1987776512
> 14/05/28 07:11:09 INFO mapred.JobClient:     Map output records=0
> 14/05/28 07:11:09 INFO mapred.JobClient:     SPLIT_RAW_BYTES=861
> 14/05/28 07:11:09 INFO parse.ParserJob: ParserJob: success
> CrawlDB update for TestCrawl
> Warning: $HADOOP_HOME is deprecated.
>
> 14/05/28 07:11:12 INFO crawl.DbUpdaterJob: DbUpdaterJob: starting
> 14/05/28 07:11:13 INFO plugin.PluginRepository: Plugins: looking in: 
> /app/hadoop/tmp/hadoop-unjar5400634919722418143/classes/plugins
> 14/05/28 07:11:13 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 14/05/28 07:11:13 INFO plugin.PluginRepository: Registered Plugins:
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         the nutch core 
> extension points (nutch-extensionpoints)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Basic URL Normalizer 
> (urlnormalizer-basic)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Html Parse Plug-in 
> (parse-html)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Basic Indexing Filter 
> (index-basic)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Http / Https Protocol 
> Plug-in (protocol-httpclient)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         HTTP Framework 
> (lib-http)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Creative Commons 
> Plugins (creativecommons)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         More Indexing Filter 
> (index-more)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Regex URL Filter 
> (urlfilter-regex)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Regex URL Normalizer 
> (urlnormalizer-regex)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
> (scoring-opic)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
> (lib-nekohtml)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         JavaScript Parser 
> (parse-js)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Regex URL Filter 
> Framework (lib-regex-filter)
> 14/05/28 07:11:13 INFO plugin.PluginRepository: Registered Extension-Points:
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Parse Filter 
> (org.apache.nutch.parse.ParseFilter)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 14/05/28 07:11:13 INFO plugin.PluginRepository:         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 14/05/28 07:11:16 INFO mapred.JobClient: Running job: job_201405280024_0019
> 14/05/28 07:11:17 INFO mapred.JobClient:  map 0% reduce 0%
> 14/05/28 07:11:28 INFO mapred.JobClient:  map 100% reduce 0%
> 14/05/28 07:11:38 INFO mapred.JobClient:  map 100% reduce 16%
> 14/05/28 07:11:39 INFO mapred.JobClient:  map 100% reduce 50%
> 14/05/28 07:19:00 INFO mapred.JobClient: Task Id : 
> attempt_201405280024_0019_r_000001_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/05/28 07:19:00 WARN mapred.JobClient: Error reading task 
> outputnutch-two-qontifi
> 14/05/28 07:19:00 WARN mapred.JobClient: Error reading task 
> outputnutch-two-qontifi
> 14/05/28 07:19:11 INFO mapred.JobClient:  map 100% reduce 66%
> 14/05/28 07:19:12 INFO mapred.JobClient:  map 100% reduce 100%
> 14/05/28 07:19:13 INFO mapred.JobClient: Job complete: job_201405280024_0019
> 14/05/28 07:19:13 INFO mapred.JobClient: Counters: 27
> 14/05/28 07:19:13 INFO mapred.JobClient:   Job Counters
> 14/05/28 07:19:13 INFO mapred.JobClient:     Launched reduce tasks=3
> 14/05/28 07:19:13 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10614
> 14/05/28 07:19:13 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Launched map tasks=1
> 14/05/28 07:19:13 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=23263
> 14/05/28 07:19:13 INFO mapred.JobClient:   File Output Format Counters
> 14/05/28 07:19:13 INFO mapred.JobClient:     Bytes Written=0
> 14/05/28 07:19:13 INFO mapred.JobClient:   FileSystemCounters
> 14/05/28 07:19:13 INFO mapred.JobClient:     FILE_BYTES_READ=44
> 14/05/28 07:19:13 INFO mapred.JobClient:     HDFS_BYTES_READ=910
> 14/05/28 07:19:13 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=238016
> 14/05/28 07:19:13 INFO mapred.JobClient:   File Input Format Counters
> 14/05/28 07:19:13 INFO mapred.JobClient:     Bytes Read=0
> 14/05/28 07:19:13 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/28 07:19:13 INFO mapred.JobClient:     Map output materialized bytes=28
> 14/05/28 07:19:13 INFO mapred.JobClient:     Map input records=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Reduce shuffle bytes=28
> 14/05/28 07:19:13 INFO mapred.JobClient:     Spilled Records=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Map output bytes=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=293601280
> 14/05/28 07:19:13 INFO mapred.JobClient:     CPU time spent (ms)=6540
> 14/05/28 07:19:13 INFO mapred.JobClient:     Combine input records=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     SPLIT_RAW_BYTES=910
> 14/05/28 07:19:13 INFO mapred.JobClient:     Reduce input records=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Reduce input groups=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Combine output records=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=470159360
> 14/05/28 07:19:13 INFO mapred.JobClient:     Reduce output records=0
> 14/05/28 07:19:13 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=5987823616
> 14/05/28 07:19:13 INFO mapred.JobClient:     Map output records=0
> 14/05/28 07:19:13 INFO crawl.DbUpdaterJob: DbUpdaterJob: done
> Indexing TestCrawl on SOLR index -> http://128.199.207.54:8983/solr/nutch
> Warning: $HADOOP_HOME is deprecated.
>
> 14/05/28 07:19:16 INFO solr.SolrIndexerJob: SolrIndexerJob: starting
> 14/05/28 07:19:16 INFO plugin.PluginRepository: Plugins: looking in: 
> /app/hadoop/tmp/hadoop-unjar5241938989393377870/classes/plugins
> 14/05/28 07:19:16 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 14/05/28 07:19:16 INFO plugin.PluginRepository: Registered Plugins:
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         the nutch core 
> extension points (nutch-extensionpoints)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Basic URL Normalizer 
> (urlnormalizer-basic)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Html Parse Plug-in 
> (parse-html)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Basic Indexing Filter 
> (index-basic)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Http / Https Protocol 
> Plug-in (protocol-httpclient)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         HTTP Framework 
> (lib-http)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Creative Commons 
> Plugins (creativecommons)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         More Indexing Filter 
> (index-more)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Regex URL Filter 
> (urlfilter-regex)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Regex URL Normalizer 
> (urlnormalizer-regex)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
> (scoring-opic)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
> (lib-nekohtml)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         JavaScript Parser 
> (parse-js)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Regex URL Filter 
> Framework (lib-regex-filter)
> 14/05/28 07:19:16 INFO plugin.PluginRepository: Registered Extension-Points:
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Parse Filter 
> (org.apache.nutch.parse.ParseFilter)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 14/05/28 07:19:16 INFO plugin.PluginRepository:         Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 14/05/28 07:19:16 INFO basic.BasicIndexingFilter: Maximum title length for 
> indexing set to: 100
> 14/05/28 07:19:16 INFO indexer.IndexingFilters: Adding 
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 14/05/28 07:19:16 INFO indexer.IndexingFilters: Adding 
> org.creativecommons.nutch.CCIndexingFilter
> 14/05/28 07:19:17 INFO indexer.IndexingFilters: Adding 
> org.apache.nutch.indexer.more.MoreIndexingFilter
> 14/05/28 07:19:21 INFO mapred.JobClient: Running job: job_201405280024_0020
> 14/05/28 07:19:22 INFO mapred.JobClient:  map 0% reduce 0%
> 14/05/28 07:19:31 INFO mapred.JobClient:  map 100% reduce 0%
> 14/05/28 07:19:33 INFO mapred.JobClient: Job complete: job_201405280024_0020
> 14/05/28 07:19:33 INFO mapred.JobClient: Counters: 17
> 14/05/28 07:19:33 INFO mapred.JobClient:   Job Counters
> 14/05/28 07:19:33 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9290
> 14/05/28 07:19:33 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/28 07:19:33 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/28 07:19:33 INFO mapred.JobClient:     Launched map tasks=1
> 14/05/28 07:19:33 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/05/28 07:19:33 INFO mapred.JobClient:   File Output Format Counters
> 14/05/28 07:19:33 INFO mapred.JobClient:     Bytes Written=0
> 14/05/28 07:19:33 INFO mapred.JobClient:   FileSystemCounters
> 14/05/28 07:19:33 INFO mapred.JobClient:     HDFS_BYTES_READ=877
> 14/05/28 07:19:33 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=79006
> 14/05/28 07:19:33 INFO mapred.JobClient:   File Input Format Counters
> 14/05/28 07:19:33 INFO mapred.JobClient:     Bytes Read=0
> 14/05/28 07:19:33 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/28 07:19:33 INFO mapred.JobClient:     Map input records=0
> 14/05/28 07:19:33 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=117587968
> 14/05/28 07:19:33 INFO mapred.JobClient:     Spilled Records=0
> 14/05/28 07:19:33 INFO mapred.JobClient:     CPU time spent (ms)=1040
> 14/05/28 07:19:33 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=59768832
> 14/05/28 07:19:33 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=1992785920
> 14/05/28 07:19:33 INFO mapred.JobClient:     Map output records=0
> 14/05/28 07:19:33 INFO mapred.JobClient:     SPLIT_RAW_BYTES=877
> 14/05/28 07:19:33 INFO solr.SolrIndexerJob: SolrIndexerJob: done.
>
>  Am I missing anything?
>
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to