Re: Few questions from a newbie

Estrada Groups Wed, 26 Jan 2011 05:40:35 -0800

You probably have to literally click on each URL to get the URL it's 
referencing. Those are URL shorteners  and probably won't play nicely with a 
crawler because of the redirection.


Adam

Sent from my iPhone

On Jan 26, 2011, at 8:02 AM, Arjun Kumar Reddy <charjunkumar.re...@iiitb.net> 
wrote:

> Hi list,
> 
> I have given the set of urls as
> 
> http://is.gd/Jt32Cf
> http://is.gd/hS3lEJ
> http://is.gd/Jy1Im3
> http://is.gd/QoJ8xy
> http://is.gd/e4ct89
> http://is.gd/WAOVmd
> http://is.gd/lhkA69
> http://is.gd/3OilLD
> ..... 43 such urls
> 
> And I have run the crawl command bin/nutch crawl urls/ -dir crawl -depth 3
> 
> *arjun@arjun-ninjas:~/nutch$* bin/nutch readdb crawl/crawldb -stats
> *CrawlDb statistics start: crawl/crawldb*
> *Statistics for CrawlDb: crawl/crawldb*
> *TOTAL urls: 43*
> *retry 0: 43*
> *min score: 1.0*
> *avg score: 1.0*
> *max score: 1.0*
> *status 3 (db_gone): 1*
> *status 4 (db_redir_temp): 1*
> *status 5 (db_redir_perm): 41*
> *CrawlDb statistics: done*
> 
> When I am trying to read the content from the segments, the content block is
> empty for every record.
> 
> Can you please tell me where I can get the content of these urls.
> 
> Thanks and regards,*
> *Arjun Kumar Reddy

Re: Few questions from a newbie

Reply via email to