Re: Nutch 2.2.1 can not index to solr

Gavin Wed, 12 Feb 2014 01:25:23 -0800

Here is my output:


[Gavin@Gavin local]$ bin/nutch  inject urls
InjectorJob: starting at 2014-02-12 17:16:20
InjectorJob: Injecting urlDir: urls
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora 
storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
[Gavin@Gavin local]$ bin/nutch generate -topN 5
GeneratorJob: starting at 2014-02-12 17:16:46
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 5
GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
GeneratorJob: generated batch id: 1392196606-229189632
[Gavin@Gavin local]$ bin/nutch fetch -all
FetcherJob: starting
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 5 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching http://www.163.com/ (queue crawl delay=5000ms)
fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
fetching http://www.taobao.com/ (queue crawl delay=5000ms)
-finishing thread FetcherThread5, activeThreads=8
-finishing thread FetcherThread6, activeThreads=8
-finishing thread FetcherThread4, activeThreads=7
-finishing thread FetcherThread3, activeThreads=6
-finishing thread FetcherThread2, activeThreads=5
fetching http://www.hao123.com/ (queue crawl delay=5000ms)
-finishing thread FetcherThread0, activeThreads=4
-finishing thread FetcherThread7, activeThreads=3
-finishing thread FetcherThread1, activeThreads=2
-finishing thread FetcherThread8, activeThreads=1
-finishing thread FetcherThread9, activeThreads=0
0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0 URLs 
in 0 queues
-activeThreads=0
FetcherJob: done
[Gavin@Gavin local]$ bin/nutch parse -all
ParserJob: starting
ParserJob: resuming:    false
ParserJob: forced reparse:    false
ParserJob: parsing all
Parsing http://www.tianya.cn/
Parsing http://www.163.com/
Parsing http://www.hao123.com/
Parsing http://www.taobao.com/
Parsing http://nutch.apache.org/
ParserJob: success
[Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
SolrIndexerJob: starting
SolrIndexerJob: done.


Thank you!


------------------ Original ------------------
From:  "d_k";<[email protected]>;
Date:  Wed, Feb 12, 2014 04:58 PM
To:  "user"<[email protected]>; 

Subject:  Re: Nutch 2.2.1 can not index to solr



What is the output of each of the steps when you execute them separately?
Did you edit regex-urlfilter.txt accordingly?

$ bin/nutch inject urls
$ bin/nutch generate -topN 5
$ bin/nutch fetch -all
$ bin/nutch parse -all

Taken from here:
https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup




On Wed, Feb 12, 2014 at 10:33 AM, Gavin <[email protected]> wrote:

> I compiled  nutch in eclipse. My storage is hbase.
> After I run the bin/crawl , there are to tables in hbase :"webpage" and
> "%crawl_ID%webpage"
> but there is no data in solr and no exception.
> why?
>
> (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> server is ok)

Re: Nutch 2.2.1 can not index to solr

Reply via email to