Re: readdblink not showing alllinks

lewis john mcgibbney Tue, 23 Aug 2011 11:09:50 -0700

If you please post your crawldb dump then we could see the structure of your
crawldb and may be able to begin pin pointing the issue.


It should not be required for you to undertake another crawl after inverting
links for these URLs to be indexed when calling solrindex command... there
must be more to it.

On Tue, Aug 23, 2011 at 6:54 PM, abhayd <[email protected]> wrote:

> hi
> after doing invert link i see the complete link graph...THANKS
>
> I m bit confused, please help me understand..
>
> I do crawl using crawl command. I see around 7000+ urls when i dump
> crawldb.
> Then i do invertlink and i see the complete link graph.
> After this i do solrindex.
>
> After solr indexing is completed i see only 2421 docs. I was expecting
> 7000+
> docs (i.e exact number of unique urls which i got from dumping crawldb as
> text)
>
> Why i just see 2421 urls/docs in solr?
> Do i need to execute crawl again after invertlink?
>
> Here are some settings
> --------------------------------------------------------------
>  <name>db.update.max.inlinks</name>
>  <value>10000</value>
>
>  <name>db.ignore.internal.links</name>
>  <value>false</value>
>
>  <name>db.max.inlinks</name>
>  <value>10000</value>
>
>  <name>db.max.outlinks.per.page</name>
>  <value>-1</value>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp3274127p3278779.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Re: readdblink not showing alllinks

Reply via email to