Did you modify the URL filtering rules to allow URLs with ? & etc...? By
default such URLs will be filtered out

On 17 August 2011 14:01, abhayd <[email protected]> wrote:

> hi
> I have seen similar posts in this forum but still not able to understand
> how
> redirect is handled..
>
> I m trying to crawl http://developer.att.com/developer/ . After successful
> crawl i dump the crawldb using readdb. I see entries like following.  What
> does this mean? Has nutch crawled the redirected page and is it in index?
>
>  I tried using readseg command  with all the segments under crawl/segments
> directory but i could not find
>
> http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037
> url.
>
> heres is my crawl/segments directory listing.
> 20110817001833  20110817002117  20110817003028  20110817003930
> 20110817004202
> 20110817001844  20110817002556  20110817003532  20110817004105
>
> Any help why redirected page is not crawled?
>
> http://developer.att.com/developer/     Version: 7
> Status: 4 (db_redir_temp)
> Fetch time: Fri Sep 16 00:18:36 CDT 2011
> Modified time: Wed Dec 31 18:00:00 CST 1969
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 1.0
> Signature: null
> Metadata: _pst_: temp_moved(13), lastModified=0:
>
> http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037
>
> http://developer.att.com/developer/100006       Version: 7
> Status: 5 (db_redir_perm)
> Fetch time: Fri Sep 16 00:43:33 CDT 2011
> Modified time: Wed Dec 31 18:00:00 CST 1969
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: null
> Metadata: _pst_: moved(12), lastModified=0:
> http://developer.att.com/developer/forward.jsp?passedItemId=100006
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3261546.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to