Could you check if urls could make it to the crawldb through the inject operation ? They can be filtered due to regex urlfilter. Run your urls against:
bin/nutch plugin urlfilter-regex org.apache.nutch.urlfilter.regex.RegexURLFilter On Fri, Aug 30, 2013 at 1:42 AM, Jonathan.Wei <252637...@qq.com> wrote: > I run bin/nutch readdb -stats. > return this message for me: > > > hadoop@nutch1:/data/projects/clusters/apache-nutch-2.2/runtime/local$ > bin/nutch readdb -stats > WebTable statistics start > Statistics for WebTable: > jobs: {db_stats-job_local_0001={jobID=job_local_0001, jobName=db_stats, > counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce > Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, > REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, > COMMITTED_HEAP_BYTES=449839104, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=1146, > COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, > COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0, > VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, > FileSystemCounters={FILE_BYTES_READ=914496, FILE_BYTES_WRITTEN=1036378}, > File Output Format Counters ={BYTES_WRITTEN=98}}}} > TOTAL urls: 0 > WebTable statistics: done > jobs: {db_stats-job_local_0001={jobID=job_local_0001, jobName=db_stats, > counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce > Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, > REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, > COMMITTED_HEAP_BYTES=449839104, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=1146, > COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, > COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0, > VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, > FileSystemCounters={FILE_BYTES_READ=914496, FILE_BYTES_WRITTEN=1036378}, > File Output Format Counters ={BYTES_WRITTEN=98}}}} > TOTAL urls: 0 > > > > Why is 0 urls? > The urls file has 216 urls! > > > > > ------------------ 原始邮件 ------------------ > 发件人: "kaveh minooie [via Lucene]"<ml-node+s472066n408744...@n3.nabble.com > >; > 发送时间: 2013年8月30日(星期五) 下午4:30 > 收件人: "基勇"<252637...@qq.com>; > > 主题: Re: Aborting with 10 hung threads? > > > > so fetch does hang when there is nothing for it to fetch. the most > likely thing that has happened here is that your inject command did not > go through successfully. you can check it by looking in to your hbase > and see if the webpage table has been created and has values (your urls > that you injected) in it. alliteratively you can just run 'nutch readdb > -stats' and see what you get. if there was nothing there double check > your config files. > > On 08/30/2013 12:12 AM, Jonathan.Wei wrote: > > And I run bin/nutch inject urls and bin/nutch generate -topN250. > > I checked the hbase,not any data! > > > > Where is the problem? > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Aborting-with-10-hung-threads-tp4087433p4087438.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > If you reply to this email, your message will be > added to the discussion below: > > http://lucene.472066.n3.nabble.com/Aborting-with-10-hung-threads-tp4087433p4087447.html > To unsubscribe from Aborting with > 10 hung threads?, click here. > NAML > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Aborting-with-10-hung-threads-tp4087450.html > Sent from the Nutch - User mailing list archive at Nabble.com. >