Re: URL in crawldb not appearing in Solr after indexing.

Sebastian Nagel Tue, 30 Jul 2013 14:14:38 -0700

Hi,

the signature of the document is null in CrawlDb.
The signature is calculated when parsing the document, so:
- has parsing taken place?
- truncated content?
- parse failure?
etc.


> How do I specifically request that an entry in crawldb gets pushed to Solr?
You have to run solrindex on the segment which contains the fetched and parsed 
data.

To check whether the segment contains all required data, you can use
% bin/nutch readseg ...

Sebastian

On 07/30/2013 06:48 PM, Os Tyler wrote:
> Hello,
> 
> I have successfully deployed Solr on our development environment and our 
> stage environment. But am running into an anomaly the third time around.
> 
> I have a specific URL that appears in the crawldb, but is not showing up in 
> when I search from the Solr interface. How do I specifically request that an 
> entry in crawldb gets pushed to Solr?
> 
> I have run solrindex multiple times and it does not produce any errors. 
> readdb, parsechecker and indexchecker all return positive results for this 
> URL. Configuration is identical on the to-be-production machine as it is on 
> dev and stage where it's correctly appearing in Solr.
> 
> /usr/local/apache-nutch/bin/nutch readdb 
> /usr/local/apache-nutch/intranet/crawldb/ -url 
> http://redacted.com/ppb/ppb_3j_002_vacation_policy.pdf
> 
> URL: http://redacted.com/ppb/ppb_3j_002_vacation_policy.pdf
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Wed Jul 31 01:22:44 EDT 2013
> Modified time: Wed Dec 31 19:00:00 EST 1969
> Retries since fetch: 0
> Retry interval: 100000 seconds (1 days)
> Score: 6.5826543E-4
> Signature: null
> Metadata: Content-Type: application/pdf_pst_: success(1), lastModified=0
> 
> 
>

Re: URL in crawldb not appearing in Solr after indexing.

Reply via email to