Are you parsing during the fetch cycle?
On Thursday 16 June 2011 13:53:22 MilleBii wrote:
The fetcher is creating me weird problems on the master node only and not
on data node despite the following actions :
+ increased HADOOP_HEAPSIZE from 1000 to 2000
+ reduced the number of threads
+
Hi
I'm testing the nutch, I followed the tutorial in the nutch,
but I found a problem. I ran the command bin / nutch crawl
6 sites in plain text that contains only about 400 lines of text, so far so
normal. When I do a search with Nutch, he sweeps up about 50 lines after
that he does not sweep
Off the top of my head one property springs to mind. Which you may or may
not have configured in nutch-site
http.content.limit
However I think that this is not the source of the problem.
I would advise you to have a look at your hadoop log file for any obvious
warnings... how do you know he
3 matches
Mail list logo