Could you check if urls could make it to the crawldb through the inject
operation ? They can be filtered due to regex urlfilter.
Run your urls against:

bin/nutch plugin urlfilter-regex org.apache.nutch.urlfilter.regex.RegexURLFilter



On Fri, Aug 30, 2013 at 1:42 AM, Jonathan.Wei <252637...@qq.com> wrote:

> I run bin/nutch readdb -stats.
> return this message for me:
>
>
> hadoop@nutch1:/data/projects/clusters/apache-nutch-2.2/runtime/local$
> bin/nutch readdb -stats
> WebTable statistics start
> Statistics for WebTable:
> jobs:   {db_stats-job_local_0001={jobID=job_local_0001, jobName=db_stats,
> counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce
> Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0,
> REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0,
> COMMITTED_HEAP_BYTES=449839104, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=1146,
> COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0,
> COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0,
> VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0},
> FileSystemCounters={FILE_BYTES_READ=914496, FILE_BYTES_WRITTEN=1036378},
> File Output Format Counters ={BYTES_WRITTEN=98}}}}
> TOTAL urls:     0
> WebTable statistics: done
> jobs:   {db_stats-job_local_0001={jobID=job_local_0001, jobName=db_stats,
> counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce
> Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0,
> REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0,
> COMMITTED_HEAP_BYTES=449839104, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=1146,
> COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0,
> COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0,
> VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0},
> FileSystemCounters={FILE_BYTES_READ=914496, FILE_BYTES_WRITTEN=1036378},
> File Output Format Counters ={BYTES_WRITTEN=98}}}}
> TOTAL urls:     0
>
>
>
> Why is 0 urls?
> The urls file has 216 urls!
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "kaveh minooie [via Lucene]"<ml-node+s472066n408744...@n3.nabble.com
> >;
> 发送时间: 2013年8月30日(星期五) 下午4:30
> 收件人: "基勇"<252637...@qq.com>;
>
> 主题: Re: Aborting with 10 hung threads?
>
>
>
>         so fetch does hang when there is nothing for it to fetch. the most
> likely thing that has happened here is that your inject command did not
> go through successfully. you can check it by looking in to your hbase
> and see if the webpage table has been created and has values (your urls
> that you injected) in it. alliteratively you can just run 'nutch readdb
> -stats' and see what you get. if there was nothing there double check
> your config files.
>
> On 08/30/2013 12:12 AM, Jonathan.Wei wrote:
> > And I run bin/nutch inject urls and bin/nutch generate -topN250.
> > I checked the hbase,not any data!
> >
> > Where is the problem?
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Aborting-with-10-hung-threads-tp4087433p4087438.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>
>
>                         If you reply to this email, your message will be
> added to the discussion below:
>
> http://lucene.472066.n3.nabble.com/Aborting-with-10-hung-threads-tp4087433p4087447.html
>                                         To unsubscribe from Aborting with
> 10 hung threads?, click here.
>                 NAML
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Aborting-with-10-hung-threads-tp4087450.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Reply via email to