which version of Nutch are you using?

Is chat a plain text file, with URLs in a list on per line? If this the case
there is no need to add it to your crawl command. Additionally, there is no
point in trying to read what is happeneing in your crawldb if your generator
log output indicates that nothing has been selected for fetching therefore
this will be skipped.

I'm slightly concerned about your crawl parameters, for example is it
necessary to use crawl-chat, I have never used hyphens before, and it is
only a suggestion, but might it be possible that Nutch is taking -chat as a
parameter as well?



On Wed, Aug 3, 2011 at 8:34 AM, Christian Weiske <
christian.wei...@netresearch.de> wrote:

> Hi,
>
>
> I'm getting the following error:
>
> $ bin/nutch readdb crawl-chat/crawldb -stats
> CrawlDb statistics start: crawl-chat/crawldb
> Statistics for CrawlDb: crawl-chat/crawldb
> Exception in thread "main" java.lang.NullPointerException
>        at
> org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:352)
>        at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502)
>
>
> The db has been created as follows, as you see no URLs have been
> fetched (another problem):
>
> $ bin/nutch crawl urls/chat -dir crawl-chat -depth 10 -topN 10000
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl-chat
> rootUrlDir = urls/chat
> threads = 10
> depth = 10
> solrUrl=null
> topN = 10000
> Injector: starting at 2011-08-03 09:31:53
> Injector: crawlDb: crawl-chat/crawldb
> Injector: urlDir: urls/chat
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-08-03 09:31:57, elapsed: 00:00:04
> Generator: starting at 2011-08-03 09:31:57
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 10000
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl-chat
>
> --
> Viele Grüße
> Christian Weiske
>



-- 
*Lewis*

Reply via email to