that was it... i dint not modify that...

Thanks!!!
Date: Thu, 18 Aug 2011 08:06:44 -0700
From: [email protected]
To: [email protected]
Subject: Re: nutch redirect treatment



        Did you modify the URL filtering rules to allow URLs with ? & etc...? By

default such URLs will be filtered out


On 17 August 2011 14:01, abhayd <[hidden email]> wrote:


> hi

> I have seen similar posts in this forum but still not able to understand

> how

> redirect is handled..

>

> I m trying to crawl http://developer.att.com/developer/ . After successful

> crawl i dump the crawldb using readdb. I see entries like following.  What

> does this mean? Has nutch crawled the redirected page and is it in index?

>

>  I tried using readseg command  with all the segments under crawl/segments

> directory but i could not find

>

> http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037
> url.

>

> heres is my crawl/segments directory listing.

> 20110817001833  20110817002117  20110817003028  20110817003930

> 20110817004202

> 20110817001844  20110817002556  20110817003532  20110817004105

>

> Any help why redirected page is not crawled?

>

> http://developer.att.com/developer/     Version: 7

> Status: 4 (db_redir_temp)

> Fetch time: Fri Sep 16 00:18:36 CDT 2011

> Modified time: Wed Dec 31 18:00:00 CST 1969

> Retries since fetch: 0

> Retry interval: 2592000 seconds (30 days)

> Score: 1.0

> Signature: null

> Metadata: _pst_: temp_moved(13), lastModified=0:

>

> http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037
>

> http://developer.att.com/developer/100006       Version: 7

> Status: 5 (db_redir_perm)

> Fetch time: Fri Sep 16 00:43:33 CDT 2011

> Modified time: Wed Dec 31 18:00:00 CST 1969

> Retries since fetch: 0

> Retry interval: 2592000 seconds (30 days)

> Score: 0.0

> Signature: null

> Metadata: _pst_: moved(12), lastModified=0:

> http://developer.att.com/developer/forward.jsp?passedItemId=100006
>

>

>

> --

> View this message in context:

> http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3261546.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

>



-- 

*

*Open Source Solutions for Text Engineering


http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

        
        

        

        
        
                If you reply to this email, your message will be added to the 
discussion below:
                
http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3265176.html
        
        
                
                To unsubscribe from nutch redirect treatment, click here.
                                                  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3265959.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to