hi 

Thanks for helping me with this.

After crawling i checked the crawldb. links with status(1) code were 2400 links 
which got into solr index fine.


thanks



Date: Tue, 23 Aug 2011 01:15:36 -0700
From: [email protected]
To: [email protected]
Subject: Re: readdblink not showing alllinks



        There are a number of parameters which limit the number of outlinks for 
a

page (IIRC 100 by default) but also the number of inlinks to consider when

inverting. Have a look at nutch-default.xml and try modifying the values in

nutch-site.xml


On 22 August 2011 20:31, abhayd <[hidden email]> wrote:


> hi

> thx for response.

>

> i just ran

> bin/crawl urls -dir crawl -depth 20 -threads 10

>

> And used readdblink.

>

> My understanding from Nutch 1.3 tutorial is if i use bin/crawl ( and not

> step by step approach) i dont have  to do any other steps for indexing or

> reading crawl db.

>

> I am doing this

> -----------------------------------------------------

> 1. bin/crawl urls -dir crawl -depth 20 -threads 10

> 2.bin/nutch solrindex http://localhost:8080/solr/core3 crawl/crawldb

> crawl/linkdb crawl/segments/*

>

> Is that the correct approach?

>

>

>

> --

> View this message in context:

> http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3276112.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

>



-- 

*

*Open Source Solutions for Text Engineering


http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

        
        

        

        
        
                If you reply to this email, your message will be added to the 
discussion below:
                
http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3277359.html
        
        
                
                To unsubscribe from readdblink not showing alllinks, click here.
                                                  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3282183.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to