Re: Google Analytics in Hadoop ?

2012-04-30 Thread Peyman Mohajerian
If you want to process logs, you don't need to use Nutch and since you are interested in storing it in Hadoop there are several log processors with Hadoop backend, Cloudera has one that I forgot the name but here is another one: http://incubator.apache.org/chukwa/docs/r0.3.0/design.html On Mon, Ap

Re: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local..

2012-04-30 Thread Igor Salma
Hi, Thanks Adriana, for such a quick reply. We'll give it another try with your suggestions. Regarding, missing library - I assumed I'm on wrong track if I need additional library, but, yes, I might be very wrong :) I'll keep you posted. All the best, Igor On Mon, Apr 30, 2012 at 3:33 PM, Adri

Google Analytics in Hadoop ?

2012-04-30 Thread Alex McLintock
Hi Folks, This is not 100% a Nutch question... and I hate it when other people say "I know my question is off topic." so why I am doing it myself I don;t know. I am looking at building a system similar to Google Analytics - in that it logs page requests on third party sites using some kind of

Re: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local..

2012-04-30 Thread Adriana Farina
Hello! I had the same kind of problem. In my case this was caused by one of the node of my cluster with full memory, so to solve the priblem I simply freed up memory on that node. Check if all of the nodes of your cluster have free memory. As for the second error, it seems you're missing some l

Re: Indexing meta tags in Nutch 1.4

2012-04-30 Thread ML mail
Hi Julien, Thanks for the hint, I have downloaded the patch and applied it to my Nutch 1.4 installation. Now I see that it created the source plugin files in src/plugin/parse-metatags and I was wondering how do I compile this source to a usable plugin by nutch? Sorry I don't have much clue abou

Re: Indexing meta tags in Nutch 1.4

2012-04-30 Thread Julien Nioche
http://wiki.apache.org/nutch/IndexMetatags refers to the code available in the trunk which is different from the zip you downloaded. You can use the patch https://issues.apache.org/jira/secure/attachment/12519226/NUTCH-809-trunk.patchcorresponding to what I committed if you can't use Nutch trunk H

Indexing meta tags in Nutch 1.4

2012-04-30 Thread ML mail
Hi, I would like to index the typical description and keywords HTML meta tags using my stable installation of Nutch 1.4. For that, I have followed the instructions from the wiki (http://wiki.apache.org/nutch/IndexMetatags) and downloaded the metatags+plugins_tutorial.zip file from the #NUTCH-80

Re: Class in the code that handles parsing of html files and selection of URLs

2012-04-30 Thread amoum
Hi, Thanks for the response and sorry for not replying earlier. I would like just to note that in case of nutch 1.4 the default parser used (probably this can change) is the "html parser" and the source code can be found under the "apache-nutch-1.4-src\src\plugin\parse-html\src\java" Best --