Thank you very much. This has worked great and resolved the issue of
finding parser.
One interesting thing is out of 10 pdf files, it has crawled 2 files and
said unsuccessful for other pdf files. This has happened like 10 times for
now.
I really need to debug and put more error messages than jus
Hi Markus,
Thanks for the tip. Is there any wiki page that talks about Nutch best
practices? So that next time I don't waste 3 days and almost 100 G of data :-(
Thanks,
Mohammad
From: Markus Jelsma ;
To: user@nutch.apache.org ;
Subject: RE: How to rec
Hi,
On Tue, Oct 23, 2012 at 2:42 PM, Mouradk wrote:
> This sits in a urls/seed.txt in NUTCH_HOME (not runtime folder but the home
> folder generated after unzipping).
Please put the urls directory (with the seed file for bootstrapping)
into /runtime/local and run the command from the script in
Hi,
On Thu, Oct 25, 2012 at 3:03 PM, manubharghav wrote:
> Will providing a core-site.xml overwriting some of the permission in
> core-default.xml in hadoop jar help ??
It's certainly something I would try.
Also have you tried using the Nutch script at all? If you can get this
working you will
Hi - there's a similar entry already, however, the fetcher.done part doesn't
seem to be correct. I can see no reason why that would ever work as Hadoop temp
files are simply no copied to the segment if it fails. There's also no notion
of an fetcher.done file in trunk.
http://wiki.apache.org/nut
I really think this should be in the FAQ's?
http://wiki.apache.org/nutch/FAQ
On Fri, Oct 26, 2012 at 2:10 PM, Markus Jelsma
wrote:
> Hi,
>
> You cannot recover the mapper output as far as i know. But anyway, one should
> never have a fetcher running for three days. It's far better to generate a
Hi,
-Original message-
> From:kiran chitturi
> Sent: Thu 25-Oct-2012 20:49
> To: user@nutch.apache.org
> Subject: Nutch 2.x Eclipse: Can't retrieve Tika parser for mime-type
> application/pdf
>
> Hi,
>
> i have built Nutch 2.x in eclipse using this tutorial (
> http://wiki.apache.org/
Hi,
You cannot recover the mapper output as far as i know. But anyway, one should
never have a fetcher running for three days. It's far better to generate a
large amount of smaller segments and fetch them sequentially. If an error
occurs, only a small portion is affected. We never run fetchers
>
> Is there anything wrong with my eclipse configuration? I am looking to
> debug some things in nutch, so i am working with eclipse and nutch.
easier to follow the steps in Remote Debugging in Eclipse from
http://wiki.apache.org/nutch/RunNutchInEclipse
it will save you all sorts of classpath
9 matches
Mail list logo