If you want to process logs, you don't need to use Nutch and since you are
interested in storing it in Hadoop there are several log processors with
Hadoop backend, Cloudera has one that I forgot the name but here is another
one:
http://incubator.apache.org/chukwa/docs/r0.3.0/design.html
On Mon, Ap
Hi,
Thanks Adriana, for such a quick reply. We'll give it another try with
your suggestions.
Regarding, missing library - I assumed I'm on wrong track if I need
additional library, but, yes, I might be very wrong :)
I'll keep you posted.
All the best,
Igor
On Mon, Apr 30, 2012 at 3:33 PM, Adri
Hi Folks,
This is not 100% a Nutch question... and I hate it when other people say "I
know my question is off topic." so why I am doing it myself I don;t
know.
I am looking at building a system similar to Google Analytics - in that it
logs page requests on third party sites using some kind of
Hello!
I had the same kind of problem. In my case this was caused by one of the node
of my cluster with full memory, so to solve the priblem I simply freed up
memory on that node. Check if all of the nodes of your cluster have free memory.
As for the second error, it seems you're missing some l
Hi Julien,
Thanks for the hint, I have downloaded the patch and applied it to my Nutch 1.4
installation. Now I see that it created the source plugin files in
src/plugin/parse-metatags and I was wondering how do I compile this source to a
usable plugin by nutch? Sorry I don't have much clue abou
http://wiki.apache.org/nutch/IndexMetatags refers to the code available in
the trunk which is different from the zip you downloaded. You can use the
patch
https://issues.apache.org/jira/secure/attachment/12519226/NUTCH-809-trunk.patchcorresponding
to what I committed if you can't use Nutch trunk
H
Hi,
I would like to index the typical description and keywords HTML meta tags using
my stable installation of Nutch 1.4. For that, I have followed the instructions
from the wiki (http://wiki.apache.org/nutch/IndexMetatags) and downloaded the
metatags+plugins_tutorial.zip file from the #NUTCH-80
Hi,
Thanks for the response and sorry for not replying earlier.
I would like just to note that in case of nutch 1.4 the default parser used
(probably this can change) is the "html parser"
and the source code can be found under the
"apache-nutch-1.4-src\src\plugin\parse-html\src\java"
Best
--
8 matches
Mail list logo