Re: Unable to fetch content

2014-07-18 Thread Vijay Chakilam
I think I got it. The redirect page url was rejected by the regex-urlfilter.txt. I edited it and the content is fetched fine. Thanks, Vijay On Jul 18, 2014, at 3:13 PM, Vijay Chakilam wrote: > Any help would be great. I even tried to do step by step. I first injected > the url, generated the

Re: Unable to fetch content

2014-07-18 Thread Vijay Chakilam
Any help would be great. I even tried to do step by step. I first injected the url, generated the sgement, fetched and parsed it. The readseg dump is the same. It doesn’t have any content, data or text. The thing I am not able to understand is that some pages that have redirects are fetched, but

Re: Unable to fetch content

2014-07-17 Thread Vijay Chakilam
Thanks for your answers Julien. I tried to use the crawl script, but I am having the same problem. I have set redirect.max to 5 and number of rounds 1 (I have also tried 2 rounds, but I guess that doesn’t help since I have already specified the redirect.max to be 5. So it should follow any redir

Re: Unable to fetch content

2014-07-17 Thread Julien Nioche
Hi On 17 July 2014 22:04, Vijay Chakilam wrote: > Thanks for your reply Julien. I am not doing any indexing and I don’t have > a solr url. Looks like crawl script requires me to specify a solr url. How > do I run crawl script without specifying a solar url. Just comment out the commands relat

Re: Unable to fetch content

2014-07-17 Thread Vijay Chakilam
Thanks for your reply Julien. I am not doing any indexing and I don’t have a solr url. Looks like crawl script requires me to specify a solr url. How do I run crawl script without specifying a solar url. Also, I want to crawl just the webpage I specify: a depth of 1. I don’t want to fetch any ou

Re: Unable to fetch content

2014-07-17 Thread Julien Nioche
Hi, The crawl command is deprecated, use the crawl script instead and give it a number of rounds > 1 so that it has a chance to fetch the redirection J. On 17 July 2014 21:10, Vijay Chakilam wrote: > Hi, > > I am trying to crawl the page at: " > http://0-search.proquest.com.alpha2.latrobe.edu

Unable to fetch content

2014-07-17 Thread Vijay Chakilam
Hi, I am trying to crawl the page at: "http://0-search.proquest.com.alpha2.latrobe.edu.au/"; Here’s the parse checker output. runtime/local/bin/nutch parsechecker -dumpText http://0-search.proquest.com.alpha2.latrobe.edu.au/ fetching: http://0-search.proquest.com.alpha2.latrobe.edu.au/ Fetch f