Hi,

I ran into the same issue as well with Nutch 1.2. You could fix it by
upgrading the version of tika parser to at least 0.8. The lib can be
found in the plugins/parse-tika/ directory of your Nutch release.

This has already been mentioned twice in the mailing-list: See
http://lucene.472066.n3.nabble.com/Full-CPU-usage-td1976780.html

I hope this will help you out.

Alexis

On Fri, Jan 28, 2011 at 1:01 AM, Chris Woolum <[email protected]> wrote:
> If you are looking at the tasktracker control panel, what does it show?
> The link is http://localhost:50030
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Thursday, January 27, 2011 3:01 PM
> To: [email protected]
> Subject: nutch crawl command takes 98% of cpu
>
> Hello,
>
> I run crawl command with -depth 7 -topN -1 on my linux box with 1.5Mps
> internet, amd 3.1ghz processor,  4GB memory, Fedora Linux 14, nutch 1.2.
> After 1-2 days nutch takes 98% of cpu. My seed file includes about 3500
> domains and I put fetch.external links to false.
>
> Is this normal? If not, what can be done to improve it?
>
> Thanks.
> Alex.
>

Reply via email to