I think I got it. The redirect page url was rejected by the
regex-urlfilter.txt. I edited it and the content is fetched fine.
Thanks,
Vijay
On Jul 18, 2014, at 3:13 PM, Vijay Chakilam wrote:
> Any help would be great. I even tried to do step by step. I first injected
> the url, generated the
Any help would be great. I even tried to do step by step. I first injected the
url, generated the sgement, fetched and parsed it. The readseg dump is the
same. It doesn’t have any content, data or text. The thing I am not able to
understand is that some pages that have redirects are fetched, but
Thanks for your answers Julien. I tried to use the crawl script, but I am
having the same problem. I have set redirect.max to 5 and number of rounds 1 (I
have also tried 2 rounds, but I guess that doesn’t help since I have already
specified the redirect.max to be 5. So it should follow any redir
Hi
On 17 July 2014 22:04, Vijay Chakilam wrote:
> Thanks for your reply Julien. I am not doing any indexing and I don’t have
> a solr url. Looks like crawl script requires me to specify a solr url. How
> do I run crawl script without specifying a solar url.
Just comment out the commands relat
Thanks for your reply Julien. I am not doing any indexing and I don’t have a
solr url. Looks like crawl script requires me to specify a solr url. How do I
run crawl script without specifying a solar url. Also, I want to crawl just the
webpage I specify: a depth of 1. I don’t want to fetch any ou
Hi,
The crawl command is deprecated, use the crawl script instead and give it a
number of rounds > 1 so that it has a chance to fetch the redirection
J.
On 17 July 2014 21:10, Vijay Chakilam wrote:
> Hi,
>
> I am trying to crawl the page at: "
> http://0-search.proquest.com.alpha2.latrobe.edu
Hi,
I am trying to crawl the page at:
"http://0-search.proquest.com.alpha2.latrobe.edu.au/";
Here’s the parse checker output.
runtime/local/bin/nutch parsechecker -dumpText
http://0-search.proquest.com.alpha2.latrobe.edu.au/
fetching: http://0-search.proquest.com.alpha2.latrobe.edu.au/
Fetch f
7 matches
Mail list logo