Dear Lewis,

I’m running Nutch 2 on a Hadoop 1.2.1 cluster (2 nodes). I’m using Cassandra as 
my backend datastore . I’m trying to crawl one link as of now. The inject 
command works properly: I’m able to find one row added to the “webpage” 
keyspace in Cassandra. But the generator doesn’t do a thing. So does the 
fetcher. In the end, nothing’s indexed in Solr.

Please help me out. My stack trace is:

hduser@nutch-one-qontifi:/usr/local/nutch$ bin/crawl urls/seed.txt TestCrawl 
http://10.130.231.16:8983/solr/nutch 2
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:00:34 INFO crawl.InjectorJob: InjectorJob: starting at 2014-06-05 
15:00:34
14/06/05 15:00:34 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: 
urls/seed.txt
14/06/05 15:00:36 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:00:40 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:00:41 INFO crawl.InjectorJob: InjectorJob: Using class 
org.apache.gora.cassandra.store.CassandraStore as the Gora storage class.
14/06/05 15:00:44 INFO input.FileInputFormat: Total input paths to process : 1
14/06/05 15:00:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/05 15:00:44 WARN snappy.LoadSnappy: Snappy native library not loaded
14/06/05 15:00:44 INFO mapred.JobClient: Running job: job_201406051410_0011
14/06/05 15:00:45 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:01:00 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:01:02 INFO mapred.JobClient: Job complete: job_201406051410_0011
14/06/05 15:01:02 INFO mapred.JobClient: Counters: 19
14/06/05 15:01:02 INFO mapred.JobClient:   Job Counters 
14/06/05 15:01:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=14861
14/06/05 15:01:02 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:01:02 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:01:02 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:01:02 INFO mapred.JobClient:     Data-local map tasks=1
14/06/05 15:01:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/06/05 15:01:02 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:01:02 INFO mapred.JobClient:     Bytes Written=0
14/06/05 15:01:02 INFO mapred.JobClient:   injector
14/06/05 15:01:02 INFO mapred.JobClient:     urls_injected=1
14/06/05 15:01:02 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:01:02 INFO mapred.JobClient:     HDFS_BYTES_READ=135
14/06/05 15:01:02 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=77648
14/06/05 15:01:02 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:01:02 INFO mapred.JobClient:     Bytes Read=25
14/06/05 15:01:02 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:01:02 INFO mapred.JobClient:     Map input records=1
14/06/05 15:01:02 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=122052608
14/06/05 15:01:02 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:01:02 INFO mapred.JobClient:     CPU time spent (ms)=1490
14/06/05 15:01:02 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=58195968
14/06/05 15:01:02 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=1119281152
14/06/05 15:01:02 INFO mapred.JobClient:     Map output records=1
14/06/05 15:01:02 INFO mapred.JobClient:     SPLIT_RAW_BYTES=110
14/06/05 15:01:02 INFO crawl.InjectorJob: InjectorJob: total number of urls 
rejected by filters: 0
14/06/05 15:01:02 INFO crawl.InjectorJob: InjectorJob: total number of urls 
injected after normalization and filtering: 1
14/06/05 15:01:02 INFO crawl.InjectorJob: Injector: finished at 2014-06-05 
15:01:02, elapsed: 00:00:28
Thu Jun 5 15:01:02 EDT 2014 : Iteration 1 of 2
Generating batchId
Generating a new fetchlist
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:01:06 INFO crawl.GeneratorJob: GeneratorJob: starting at 2014-06-05 
15:01:06
14/06/05 15:01:06 INFO crawl.GeneratorJob: GeneratorJob: Selecting best-scoring 
urls due for fetch.
14/06/05 15:01:06 INFO crawl.GeneratorJob: GeneratorJob: starting
14/06/05 15:01:06 INFO crawl.GeneratorJob: GeneratorJob: filtering: false
14/06/05 15:01:06 INFO crawl.GeneratorJob: GeneratorJob: normalizing: false
14/06/05 15:01:06 INFO crawl.GeneratorJob: GeneratorJob: topN: 50000
14/06/05 15:01:06 INFO crawl.FetchScheduleFactory: Using FetchSchedule impl: 
org.apache.nutch.crawl.DefaultFetchSchedule
14/06/05 15:01:06 INFO crawl.AbstractFetchSchedule: defaultInterval=2592000
14/06/05 15:01:06 INFO crawl.AbstractFetchSchedule: maxInterval=7776000
14/06/05 15:01:07 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:01:11 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:01:15 INFO mapred.JobClient: Running job: job_201406051410_0012
14/06/05 15:01:16 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:01:55 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:02:05 INFO mapred.JobClient:  map 100% reduce 33%
14/06/05 15:02:08 INFO mapred.JobClient:  map 100% reduce 66%
14/06/05 15:02:10 INFO mapred.JobClient:  map 100% reduce 83%
14/06/05 15:02:11 INFO mapred.JobClient:  map 100% reduce 100%
14/06/05 15:02:14 INFO mapred.JobClient: Job complete: job_201406051410_0012
14/06/05 15:02:14 INFO mapred.JobClient: Counters: 27
14/06/05 15:02:14 INFO mapred.JobClient:   Job Counters 
14/06/05 15:02:14 INFO mapred.JobClient:     Launched reduce tasks=2
14/06/05 15:02:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=39990
14/06/05 15:02:14 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:02:14 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:02:14 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:02:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=29119
14/06/05 15:02:14 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:02:14 INFO mapred.JobClient:     Bytes Written=0
14/06/05 15:02:14 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:02:14 INFO mapred.JobClient:     FILE_BYTES_READ=44
14/06/05 15:02:14 INFO mapred.JobClient:     HDFS_BYTES_READ=951
14/06/05 15:02:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=239453
14/06/05 15:02:14 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:02:14 INFO mapred.JobClient:     Bytes Read=0
14/06/05 15:02:14 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:02:14 INFO mapred.JobClient:     Map output materialized bytes=28
14/06/05 15:02:14 INFO mapred.JobClient:     Map input records=0
14/06/05 15:02:14 INFO mapred.JobClient:     Reduce shuffle bytes=28
14/06/05 15:02:14 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:02:14 INFO mapred.JobClient:     Map output bytes=0
14/06/05 15:02:14 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=333971456
14/06/05 15:02:14 INFO mapred.JobClient:     CPU time spent (ms)=9330
14/06/05 15:02:14 INFO mapred.JobClient:     Combine input records=0
14/06/05 15:02:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=951
14/06/05 15:02:14 INFO mapred.JobClient:     Reduce input records=0
14/06/05 15:02:14 INFO mapred.JobClient:     Reduce input groups=0
14/06/05 15:02:14 INFO mapred.JobClient:     Combine output records=0
14/06/05 15:02:14 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=486813696
14/06/05 15:02:14 INFO mapred.JobClient:     Reduce output records=0
14/06/05 15:02:14 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=6016212992
14/06/05 15:02:14 INFO mapred.JobClient:     Map output records=0
14/06/05 15:02:14 INFO crawl.GeneratorJob: GeneratorJob: finished at 2014-06-05 
15:02:14, time elapsed: 00:01:08
14/06/05 15:02:14 INFO crawl.GeneratorJob: GeneratorJob: generated batch id: 
1401994862-29963
Fetching : 
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:02:18 INFO fetcher.FetcherJob: FetcherJob: starting
14/06/05 15:02:18 INFO fetcher.FetcherJob: FetcherJob: batchId: 1401994862-29963
14/06/05 15:02:18 INFO fetcher.FetcherJob: FetcherJob: threads: 50
14/06/05 15:02:18 INFO fetcher.FetcherJob: FetcherJob: parsing: false
14/06/05 15:02:18 INFO fetcher.FetcherJob: FetcherJob: resuming: false
14/06/05 15:02:18 INFO fetcher.FetcherJob: FetcherJob : timelimit set for : 
1402005738902
14/06/05 15:02:19 INFO plugin.PluginRepository: Plugins: looking in: 
/app/hadoop/tmp/hadoop-unjar813633856909664022/classes/plugins
14/06/05 15:02:20 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
14/06/05 15:02:20 INFO plugin.PluginRepository: Registered Plugins:
14/06/05 15:02:20 INFO plugin.PluginRepository:         the nutch core 
extension points (nutch-extensionpoints)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Regex URL Normalizer 
(urlnormalizer-regex)
14/06/05 15:02:20 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
(lib-nekohtml)
14/06/05 15:02:20 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
(scoring-opic)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Basic URL Normalizer 
(urlnormalizer-basic)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Tika Parser Plug-in 
(parse-tika)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Basic Indexing Filter 
(index-basic)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Html Parse Plug-in 
(parse-html)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Anchor Indexing Filter 
(index-anchor)
14/06/05 15:02:20 INFO plugin.PluginRepository:         HTTP Framework 
(lib-http)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Regex URL Filter 
(urlfilter-regex)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Regex URL Filter 
Framework (lib-regex-filter)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Pass-through URL 
Normalizer (urlnormalizer-pass)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Http Protocol Plug-in 
(protocol-http)
14/06/05 15:02:20 INFO plugin.PluginRepository: Registered Extension-Points:
14/06/05 15:02:20 INFO plugin.PluginRepository:         Nutch URL Normalizer 
(org.apache.nutch.net.URLNormalizer)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Parse Filter 
(org.apache.nutch.parse.ParseFilter)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Nutch URL Filter 
(org.apache.nutch.net.URLFilter)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Nutch Indexing Filter 
(org.apache.nutch.indexer.IndexingFilter)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Nutch Content Parser 
(org.apache.nutch.parse.Parser)
14/06/05 15:02:20 INFO plugin.PluginRepository:         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
14/06/05 15:02:20 INFO http.Http: http.proxy.host = null
14/06/05 15:02:20 INFO http.Http: http.proxy.port = 8080
14/06/05 15:02:20 INFO http.Http: http.timeout = 10000
14/06/05 15:02:20 INFO http.Http: http.content.limit = 65536
14/06/05 15:02:20 INFO http.Http: http.agent = Qontifi/Nutch-2.2.1 (A big data 
analytics and social media intelligence platform; http://qontifi.com; 
manikandan at thesocialpeople dot net)
14/06/05 15:02:20 INFO http.Http: http.accept.language = 
en-us,en-gb,en;q=0.7,*;q=0.3
14/06/05 15:02:20 INFO http.Http: http.accept = 
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
14/06/05 15:02:20 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:02:25 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:02:29 INFO mapred.JobClient: Running job: job_201406051410_0013
14/06/05 15:02:30 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:03:05 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:03:14 INFO mapred.JobClient:  map 100% reduce 16%
14/06/05 15:03:16 INFO mapred.JobClient:  map 100% reduce 33%
14/06/05 15:03:17 INFO mapred.JobClient:  map 100% reduce 50%
14/06/05 15:03:19 INFO mapred.JobClient:  map 100% reduce 66%
14/06/05 15:03:23 INFO mapred.JobClient:  map 100% reduce 83%
14/06/05 15:03:28 INFO mapred.JobClient:  map 100% reduce 100%
14/06/05 15:03:31 INFO mapred.JobClient: Job complete: job_201406051410_0013
14/06/05 15:03:31 INFO mapred.JobClient: Counters: 28
14/06/05 15:03:31 INFO mapred.JobClient:   Job Counters 
14/06/05 15:03:31 INFO mapred.JobClient:     Launched reduce tasks=2
14/06/05 15:03:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=37163
14/06/05 15:03:31 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:03:31 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:03:31 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:03:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=39755
14/06/05 15:03:31 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:03:31 INFO mapred.JobClient:     Bytes Written=0
14/06/05 15:03:31 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:03:31 INFO mapred.JobClient:     FILE_BYTES_READ=44
14/06/05 15:03:31 INFO mapred.JobClient:     HDFS_BYTES_READ=935
14/06/05 15:03:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=237923
14/06/05 15:03:31 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:03:31 INFO mapred.JobClient:     Bytes Read=0
14/06/05 15:03:31 INFO mapred.JobClient:   FetcherStatus
14/06/05 15:03:31 INFO mapred.JobClient:     HitByTimeLimit-QueueFeeder=0
14/06/05 15:03:31 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:03:31 INFO mapred.JobClient:     Map output materialized bytes=28
14/06/05 15:03:31 INFO mapred.JobClient:     Map input records=0
14/06/05 15:03:31 INFO mapred.JobClient:     Reduce shuffle bytes=28
14/06/05 15:03:31 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:03:31 INFO mapred.JobClient:     Map output bytes=0
14/06/05 15:03:31 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=375914496
14/06/05 15:03:31 INFO mapred.JobClient:     CPU time spent (ms)=9820
14/06/05 15:03:31 INFO mapred.JobClient:     Combine input records=0
14/06/05 15:03:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=935
14/06/05 15:03:31 INFO mapred.JobClient:     Reduce input records=0
14/06/05 15:03:31 INFO mapred.JobClient:     Reduce input groups=0
14/06/05 15:03:31 INFO mapred.JobClient:     Combine output records=0
14/06/05 15:03:31 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=510382080
14/06/05 15:03:31 INFO mapred.JobClient:     Reduce output records=0
14/06/05 15:03:31 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=6060650496
14/06/05 15:03:31 INFO mapred.JobClient:     Map output records=0
14/06/05 15:03:31 INFO fetcher.FetcherJob: FetcherJob: done
Parsing : 
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:03:34 INFO parse.ParserJob: ParserJob: starting
14/06/05 15:03:34 INFO parse.ParserJob: ParserJob: resuming:    false
14/06/05 15:03:34 INFO parse.ParserJob: ParserJob: forced reparse:      false
14/06/05 15:03:34 INFO parse.ParserJob: ParserJob: batchId:     1401994862-29963
14/06/05 15:03:35 INFO plugin.PluginRepository: Plugins: looking in: 
/app/hadoop/tmp/hadoop-unjar8143815380567453850/classes/plugins
14/06/05 15:03:36 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
14/06/05 15:03:36 INFO plugin.PluginRepository: Registered Plugins:
14/06/05 15:03:36 INFO plugin.PluginRepository:         the nutch core 
extension points (nutch-extensionpoints)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Regex URL Normalizer 
(urlnormalizer-regex)
14/06/05 15:03:36 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
(lib-nekohtml)
14/06/05 15:03:36 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
(scoring-opic)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Basic URL Normalizer 
(urlnormalizer-basic)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Tika Parser Plug-in 
(parse-tika)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Basic Indexing Filter 
(index-basic)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Html Parse Plug-in 
(parse-html)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Anchor Indexing Filter 
(index-anchor)
14/06/05 15:03:36 INFO plugin.PluginRepository:         HTTP Framework 
(lib-http)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Regex URL Filter 
(urlfilter-regex)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Regex URL Filter 
Framework (lib-regex-filter)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Pass-through URL 
Normalizer (urlnormalizer-pass)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Http Protocol Plug-in 
(protocol-http)
14/06/05 15:03:36 INFO plugin.PluginRepository: Registered Extension-Points:
14/06/05 15:03:36 INFO plugin.PluginRepository:         Nutch URL Normalizer 
(org.apache.nutch.net.URLNormalizer)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Parse Filter 
(org.apache.nutch.parse.ParseFilter)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Nutch URL Filter 
(org.apache.nutch.net.URLFilter)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Nutch Indexing Filter 
(org.apache.nutch.indexer.IndexingFilter)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Nutch Content Parser 
(org.apache.nutch.parse.Parser)
14/06/05 15:03:36 INFO plugin.PluginRepository:         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
14/06/05 15:03:36 INFO conf.Configuration: found resource parse-plugins.xml at 
file:/app/hadoop/tmp/hadoop-unjar8143815380567453850/parse-plugins.xml
14/06/05 15:03:36 INFO crawl.SignatureFactory: Using Signature impl: 
org.apache.nutch.crawl.MD5Signature
14/06/05 15:03:37 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:03:41 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:03:45 INFO mapred.JobClient: Running job: job_201406051410_0014
14/06/05 15:03:46 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:04:22 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:04:24 INFO mapred.JobClient: Job complete: job_201406051410_0014
14/06/05 15:04:25 INFO mapred.JobClient: Counters: 17
14/06/05 15:04:25 INFO mapred.JobClient:   Job Counters 
14/06/05 15:04:25 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=36653
14/06/05 15:04:25 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:04:25 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:04:25 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:04:25 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/06/05 15:04:25 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:04:25 INFO mapred.JobClient:     Bytes Written=0
14/06/05 15:04:25 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:04:25 INFO mapred.JobClient:     HDFS_BYTES_READ=979
14/06/05 15:04:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=78853
14/06/05 15:04:25 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:04:25 INFO mapred.JobClient:     Bytes Read=0
14/06/05 15:04:25 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:04:25 INFO mapred.JobClient:     Map input records=0
14/06/05 15:04:25 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=129826816
14/06/05 15:04:25 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:04:25 INFO mapred.JobClient:     CPU time spent (ms)=2330
14/06/05 15:04:25 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=60817408
14/06/05 15:04:25 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=2000629760
14/06/05 15:04:25 INFO mapred.JobClient:     Map output records=0
14/06/05 15:04:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=979
14/06/05 15:04:25 INFO parse.ParserJob: ParserJob: success
CrawlDB update for TestCrawl
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:04:28 INFO crawl.DbUpdaterJob: DbUpdaterJob: starting
14/06/05 15:04:29 INFO plugin.PluginRepository: Plugins: looking in: 
/app/hadoop/tmp/hadoop-unjar4238316120015868426/classes/plugins
14/06/05 15:04:29 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
14/06/05 15:04:29 INFO plugin.PluginRepository: Registered Plugins:
14/06/05 15:04:29 INFO plugin.PluginRepository:         the nutch core 
extension points (nutch-extensionpoints)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Regex URL Normalizer 
(urlnormalizer-regex)
14/06/05 15:04:29 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
(lib-nekohtml)
14/06/05 15:04:29 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
(scoring-opic)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Basic URL Normalizer 
(urlnormalizer-basic)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Tika Parser Plug-in 
(parse-tika)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Basic Indexing Filter 
(index-basic)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Html Parse Plug-in 
(parse-html)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Anchor Indexing Filter 
(index-anchor)
14/06/05 15:04:29 INFO plugin.PluginRepository:         HTTP Framework 
(lib-http)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Regex URL Filter 
(urlfilter-regex)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Regex URL Filter 
Framework (lib-regex-filter)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Pass-through URL 
Normalizer (urlnormalizer-pass)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Http Protocol Plug-in 
(protocol-http)
14/06/05 15:04:29 INFO plugin.PluginRepository: Registered Extension-Points:
14/06/05 15:04:29 INFO plugin.PluginRepository:         Nutch URL Normalizer 
(org.apache.nutch.net.URLNormalizer)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Parse Filter 
(org.apache.nutch.parse.ParseFilter)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Nutch URL Filter 
(org.apache.nutch.net.URLFilter)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Nutch Indexing Filter 
(org.apache.nutch.indexer.IndexingFilter)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Nutch Content Parser 
(org.apache.nutch.parse.Parser)
14/06/05 15:04:29 INFO plugin.PluginRepository:         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
14/06/05 15:04:30 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:04:34 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:04:38 INFO mapred.JobClient: Running job: job_201406051410_0015
14/06/05 15:04:39 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:05:21 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:05:31 INFO mapred.JobClient:  map 100% reduce 33%
14/06/05 15:05:34 INFO mapred.JobClient:  map 100% reduce 66%
14/06/05 15:05:37 INFO mapred.JobClient:  map 100% reduce 100%
14/06/05 15:05:39 INFO mapred.JobClient: Job complete: job_201406051410_0015
14/06/05 15:05:39 INFO mapred.JobClient: Counters: 27
14/06/05 15:05:39 INFO mapred.JobClient:   Job Counters 
14/06/05 15:05:39 INFO mapred.JobClient:     Launched reduce tasks=2
14/06/05 15:05:39 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=39898
14/06/05 15:05:39 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:05:39 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:05:39 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:05:39 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=30439
14/06/05 15:05:39 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:05:39 INFO mapred.JobClient:     Bytes Written=0
14/06/05 15:05:39 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:05:39 INFO mapred.JobClient:     FILE_BYTES_READ=44
14/06/05 15:05:39 INFO mapred.JobClient:     HDFS_BYTES_READ=1028
14/06/05 15:05:39 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=237914
14/06/05 15:05:39 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:05:39 INFO mapred.JobClient:     Bytes Read=0
14/06/05 15:05:39 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:05:39 INFO mapred.JobClient:     Map output materialized bytes=28
14/06/05 15:05:39 INFO mapred.JobClient:     Map input records=0
14/06/05 15:05:39 INFO mapred.JobClient:     Reduce shuffle bytes=28
14/06/05 15:05:39 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:05:39 INFO mapred.JobClient:     Map output bytes=0
14/06/05 15:05:39 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=375914496
14/06/05 15:05:39 INFO mapred.JobClient:     CPU time spent (ms)=8880
14/06/05 15:05:39 INFO mapred.JobClient:     Combine input records=0
14/06/05 15:05:39 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1028
14/06/05 15:05:39 INFO mapred.JobClient:     Reduce input records=0
14/06/05 15:05:39 INFO mapred.JobClient:     Reduce input groups=0
14/06/05 15:05:39 INFO mapred.JobClient:     Combine output records=0
14/06/05 15:05:39 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=490651648
14/06/05 15:05:39 INFO mapred.JobClient:     Reduce output records=0
14/06/05 15:05:39 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=6002880512
14/06/05 15:05:39 INFO mapred.JobClient:     Map output records=0
14/06/05 15:05:39 INFO crawl.DbUpdaterJob: DbUpdaterJob: done
Indexing TestCrawl on SOLR index -> http://10.130.231.16:8983/solr/nutch
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:05:43 INFO solr.SolrIndexerJob: SolrIndexerJob: starting
14/06/05 15:05:44 INFO plugin.PluginRepository: Plugins: looking in: 
/app/hadoop/tmp/hadoop-unjar7543842044056940295/classes/plugins
14/06/05 15:05:44 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
14/06/05 15:05:44 INFO plugin.PluginRepository: Registered Plugins:
14/06/05 15:05:44 INFO plugin.PluginRepository:         the nutch core 
extension points (nutch-extensionpoints)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Regex URL Normalizer 
(urlnormalizer-regex)
14/06/05 15:05:44 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
(lib-nekohtml)
14/06/05 15:05:44 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
(scoring-opic)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Basic URL Normalizer 
(urlnormalizer-basic)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Tika Parser Plug-in 
(parse-tika)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Basic Indexing Filter 
(index-basic)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Html Parse Plug-in 
(parse-html)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Anchor Indexing Filter 
(index-anchor)
14/06/05 15:05:44 INFO plugin.PluginRepository:         HTTP Framework 
(lib-http)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Regex URL Filter 
(urlfilter-regex)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Regex URL Filter 
Framework (lib-regex-filter)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Pass-through URL 
Normalizer (urlnormalizer-pass)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Http Protocol Plug-in 
(protocol-http)
14/06/05 15:05:44 INFO plugin.PluginRepository: Registered Extension-Points:
14/06/05 15:05:44 INFO plugin.PluginRepository:         Nutch URL Normalizer 
(org.apache.nutch.net.URLNormalizer)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Parse Filter 
(org.apache.nutch.parse.ParseFilter)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Nutch URL Filter 
(org.apache.nutch.net.URLFilter)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Nutch Indexing Filter 
(org.apache.nutch.indexer.IndexingFilter)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Nutch Content Parser 
(org.apache.nutch.parse.Parser)
14/06/05 15:05:44 INFO plugin.PluginRepository:         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
14/06/05 15:05:44 INFO basic.BasicIndexingFilter: Maximum title length for 
indexing set to: 100
14/06/05 15:05:44 INFO indexer.IndexingFilters: Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
14/06/05 15:05:44 INFO anchor.AnchorIndexingFilter: Anchor deduplication is: off
14/06/05 15:05:44 INFO indexer.IndexingFilters: Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
14/06/05 15:05:45 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:05:49 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:05:52 INFO mapred.JobClient: Running job: job_201406051410_0016
14/06/05 15:05:53 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:06:29 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:06:32 INFO mapred.JobClient: Job complete: job_201406051410_0016
14/06/05 15:06:32 INFO mapred.JobClient: Counters: 17
14/06/05 15:06:32 INFO mapred.JobClient:   Job Counters 
14/06/05 15:06:32 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=36879
14/06/05 15:06:32 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:06:32 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:06:32 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:06:32 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/06/05 15:06:32 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:06:32 INFO mapred.JobClient:     Bytes Written=0
14/06/05 15:06:32 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:06:32 INFO mapred.JobClient:     HDFS_BYTES_READ=962
14/06/05 15:06:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=78923
14/06/05 15:06:32 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:06:32 INFO mapred.JobClient:     Bytes Read=0
14/06/05 15:06:32 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:06:32 INFO mapred.JobClient:     Map input records=0
14/06/05 15:06:32 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=114335744
14/06/05 15:06:32 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:06:32 INFO mapred.JobClient:     CPU time spent (ms)=2670
14/06/05 15:06:32 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=60293120
14/06/05 15:06:32 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=1990189056
14/06/05 15:06:32 INFO mapred.JobClient:     Map output records=0
14/06/05 15:06:32 INFO mapred.JobClient:     SPLIT_RAW_BYTES=962
14/06/05 15:06:32 INFO solr.SolrIndexerJob: SolrIndexerJob: done.

When I run readdb -stats, I get:

hduser@nutch-one-qontifi:/usr/local/nutch$ bin/nutch readdb TestCrawl -stats
Warning: $HADOOP_HOME is deprecated.

14/06/05 15:13:19 INFO crawl.WebTableReader: WebTable statistics start
14/06/05 15:13:21 INFO connection.CassandraHostRetryService: Downed Host Retry 
service started with queue size -1 and retry delay 10s
14/06/05 15:13:25 INFO service.JmxMonitor: Registering JMX 
me.prettyprint.cassandra.service_Qontifi:ServiceType=hector,MonitorType=hector
14/06/05 15:13:29 INFO mapred.JobClient: Running job: job_201406051410_0019
14/06/05 15:13:30 INFO mapred.JobClient:  map 0% reduce 0%
14/06/05 15:14:06 INFO mapred.JobClient:  map 100% reduce 0%
14/06/05 15:14:15 INFO mapred.JobClient:  map 100% reduce 33%
14/06/05 15:14:17 INFO mapred.JobClient:  map 100% reduce 100%
14/06/05 15:14:19 INFO mapred.JobClient: Job complete: job_201406051410_0019
14/06/05 15:14:19 INFO mapred.JobClient: Counters: 28
14/06/05 15:14:19 INFO mapred.JobClient:   Job Counters 
14/06/05 15:14:19 INFO mapred.JobClient:     Launched reduce tasks=1
14/06/05 15:14:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=36697
14/06/05 15:14:19 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/06/05 15:14:19 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/06/05 15:14:19 INFO mapred.JobClient:     Launched map tasks=1
14/06/05 15:14:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10302
14/06/05 15:14:19 INFO mapred.JobClient:   File Output Format Counters 
14/06/05 15:14:19 INFO mapred.JobClient:     Bytes Written=86
14/06/05 15:14:19 INFO mapred.JobClient:   FileSystemCounters
14/06/05 15:14:19 INFO mapred.JobClient:     FILE_BYTES_READ=6
14/06/05 15:14:19 INFO mapred.JobClient:     HDFS_BYTES_READ=1135
14/06/05 15:14:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=157112
14/06/05 15:14:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
14/06/05 15:14:19 INFO mapred.JobClient:   File Input Format Counters 
14/06/05 15:14:19 INFO mapred.JobClient:     Bytes Read=0
14/06/05 15:14:19 INFO mapred.JobClient:   Map-Reduce Framework
14/06/05 15:14:19 INFO mapred.JobClient:     Map output materialized bytes=6
14/06/05 15:14:19 INFO mapred.JobClient:     Map input records=0
14/06/05 15:14:19 INFO mapred.JobClient:     Reduce shuffle bytes=6
14/06/05 15:14:19 INFO mapred.JobClient:     Spilled Records=0
14/06/05 15:14:19 INFO mapred.JobClient:     Map output bytes=0
14/06/05 15:14:19 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=216530944
14/06/05 15:14:19 INFO mapred.JobClient:     CPU time spent (ms)=2450
14/06/05 15:14:19 INFO mapred.JobClient:     Combine input records=0
14/06/05 15:14:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1135
14/06/05 15:14:19 INFO mapred.JobClient:     Reduce input records=0
14/06/05 15:14:19 INFO mapred.JobClient:     Reduce input groups=0
14/06/05 15:14:19 INFO mapred.JobClient:     Combine output records=0
14/06/05 15:14:19 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=320630784
14/06/05 15:14:19 INFO mapred.JobClient:     Reduce output records=0
14/06/05 15:14:19 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=2254024704
14/06/05 15:14:19 INFO mapred.JobClient:     Map output records=0
14/06/05 15:14:19 INFO crawl.WebTableReader: Statistics for WebTable: 
14/06/05 15:14:19 INFO crawl.WebTableReader: jobs:      
{db_stats-job_201406051410_0019={jobID=job_201406051410_0019, jobName=db_stats, 
counters={File Input Format Counters ={BYTES_READ=0}, Job Counters 
={TOTAL_LAUNCHED_REDUCES=1, SLOTS_MILLIS_MAPS=36697, 
FALLOW_SLOTS_MILLIS_REDUCES=0, FALLOW_SLOTS_MILLIS_MAPS=0, 
TOTAL_LAUNCHED_MAPS=1, SLOTS_MILLIS_REDUCES=10302}, Map-Reduce 
Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, 
REDUCE_SHUFFLE_BYTES=6, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, 
COMMITTED_HEAP_BYTES=216530944, CPU_MILLISECONDS=2450, SPLIT_RAW_BYTES=1135, 
COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, 
COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=320630784, 
REDUCE_OUTPUT_RECORDS=0, VIRTUAL_MEMORY_BYTES=2254024704, 
MAP_OUTPUT_RECORDS=0}, FileSystemCounters={FILE_BYTES_READ=6, 
HDFS_BYTES_READ=1135, FILE_BYTES_WRITTEN=157112, HDFS_BYTES_WRITTEN=86}, File 
Output Format Counters ={BYTES_WRITTEN=86}}}}
14/06/05 15:14:19 INFO crawl.WebTableReader: TOTAL urls:        0
14/06/05 15:14:19 INFO crawl.WebTableReader: WebTable statistics: done
14/06/05 15:14:19 INFO crawl.WebTableReader: jobs:      
{db_stats-job_201406051410_0019={jobID=job_201406051410_0019, jobName=db_stats, 
counters={File Input Format Counters ={BYTES_READ=0}, Job Counters 
={TOTAL_LAUNCHED_REDUCES=1, SLOTS_MILLIS_MAPS=36697, 
FALLOW_SLOTS_MILLIS_REDUCES=0, FALLOW_SLOTS_MILLIS_MAPS=0, 
TOTAL_LAUNCHED_MAPS=1, SLOTS_MILLIS_REDUCES=10302}, Map-Reduce 
Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, 
REDUCE_SHUFFLE_BYTES=6, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, 
COMMITTED_HEAP_BYTES=216530944, CPU_MILLISECONDS=2450, SPLIT_RAW_BYTES=1135, 
COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, 
COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=320630784, 
REDUCE_OUTPUT_RECORDS=0, VIRTUAL_MEMORY_BYTES=2254024704, 
MAP_OUTPUT_RECORDS=0}, FileSystemCounters={FILE_BYTES_READ=6, 
HDFS_BYTES_READ=1135, FILE_BYTES_WRITTEN=157112, HDFS_BYTES_WRITTEN=86}, File 
Output Format Counters ={BYTES_WRITTEN=86}}}}
14/06/05 15:14:19 INFO crawl.WebTableReader: TOTAL urls:        0

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

Reply via email to