Re: Nutch 2.2.1 can not index to solr

d_k Wed, 12 Feb 2014 03:16:50 -0800

Are you sure solr is not throwing any errors?
Did you make any changes to the schema? What schema does Solr use? What
version of Solr are you using?
You can turn on the debug logs by changing the logging level to DEBUG in
the log4j.properties properties file inside the conf dir in the
runtime/local dir. (I assume this is your setup, let me know if its not).
You can also try to debug nutch in eclipse as described here:
https://wiki.apache.org/nutch/RunNutchInEclipse



On Wed, Feb 12, 2014 at 11:31 AM, Gavin <[email protected]> wrote:

> andm my solr:
>
>
> Statistics
>
>                                  Last Modified:
> Num Docs:0Max Doc:0Heap Memory Usage:0Deleted Docs:0Version:1Segment
> Count:0Optimized:
> Current:
>
>
>
> what is wrong?
>
> Thanks for your help!!!
>
>
>
>
>
> ------------------ Original ------------------
> From:  "274614348";<[email protected]>;
> Date:  Wed, Feb 12, 2014 05:24 PM
> To:  "user"<[email protected]>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> Here is my output:
>
>
> [Gavin@Gavin local]$ bin/nutch  inject urls
> InjectorJob: starting at 2014-02-12 17:16:20
> InjectorJob: Injecting urlDir: urls
> InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the
> Gora storage class.
> InjectorJob: total number of urls rejected by filters: 0
> InjectorJob: total number of urls injected after normalization and
> filtering: 1
> Injector: finished at 2014-02-12 17:16:25, elapsed: 00:00:04
> [Gavin@Gavin local]$ bin/nutch generate -topN 5
> GeneratorJob: starting at 2014-02-12 17:16:46
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: normalizing: true
> GeneratorJob: topN: 5
> GeneratorJob: finished at 2014-02-12 17:16:51, time elapsed: 00:00:05
> GeneratorJob: generated batch id: 1392196606-229189632
> [Gavin@Gavin local]$ bin/nutch fetch -all
> FetcherJob: starting
> FetcherJob: fetching all
> FetcherJob: threads: 10
> FetcherJob: parsing: false
> FetcherJob: resuming: false
> FetcherJob : timelimit set for : -1
> Using queue mode : byHost
> Fetcher: threads: 10
> QueueFeeder finished: total 5 records. Hit by time limit :0
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold sequence: 5
> fetching http://www.163.com/ (queue crawl delay=5000ms)
> fetching http://nutch.apache.org/ (queue crawl delay=5000ms)
> fetching http://www.tianya.cn/ (queue crawl delay=5000ms)
> fetching http://www.taobao.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread5, activeThreads=8
> -finishing thread FetcherThread6, activeThreads=8
> -finishing thread FetcherThread4, activeThreads=7
> -finishing thread FetcherThread3, activeThreads=6
> -finishing thread FetcherThread2, activeThreads=5
> fetching http://www.hao123.com/ (queue crawl delay=5000ms)
> -finishing thread FetcherThread0, activeThreads=4
> -finishing thread FetcherThread7, activeThreads=3
> -finishing thread FetcherThread1, activeThreads=2
> -finishing thread FetcherThread8, activeThreads=1
> -finishing thread FetcherThread9, activeThreads=0
> 0/0 spinwaiting/active, 4 pages, 0 errors, 0.8 1 pages/s, 242 242 kb/s, 0
> URLs in 0 queues
> -activeThreads=0
> FetcherJob: done
> [Gavin@Gavin local]$ bin/nutch parse -all
> ParserJob: starting
> ParserJob: resuming:    false
> ParserJob: forced reparse:    false
> ParserJob: parsing all
> Parsing http://www.tianya.cn/
> Parsing http://www.163.com/
> Parsing http://www.hao123.com/
> Parsing http://www.taobao.com/
> Parsing http://nutch.apache.org/
> ParserJob: success
> [Gavin@Gavin local]$ bin/nutch solrindex http://127.0.0.1:8983/solr -all
> SolrIndexerJob: starting
> SolrIndexerJob: done.
>
>
> Thank you!
>
>
> ------------------ Original ------------------
> From:  "d_k";<[email protected]>;
> Date:  Wed, Feb 12, 2014 04:58 PM
> To:  "user"<[email protected]>;
>
> Subject:  Re: Nutch 2.2.1 can not index to solr
>
>
>
> What is the output of each of the steps when you execute them separately?
> Did you edit regex-urlfilter.txt accordingly?
>
> $ bin/nutch inject urls
> $ bin/nutch generate -topN 5
> $ bin/nutch fetch -all
> $ bin/nutch parse -all
>
> Taken from here:
> https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup
>
>
>
>
> On Wed, Feb 12, 2014 at 10:33 AM, Gavin <[email protected]> wrote:
>
> > I compiled  nutch in eclipse. My storage is hbase.
> > After I run the bin/crawl , there are to tables in hbase :"webpage" and
> > "%crawl_ID%webpage"
> > but there is no data in solr and no exception.
> > why?
> >
> > (I can crawl and index to solr server use nutch1.7.bin,so I think my solr
> > server is ok)
>

Re: Nutch 2.2.1 can not index to solr

Reply via email to