RCfile

2012-05-23 Thread yingnan.ma
Hi, I want to use RCfile to address the IO problem, and I can not find some paper about how to install or how to use it by PIG, so if you had some install or configue file, you could share with me. Thank you. Best Regards Malone 2012-05-24

Re: Memory exception in the mapper

2012-05-23 Thread Mark Kerzner
Thanks, Joey, we are in beta, and I kinda need these for debugging. But as soon as we go to production, your word is well taken. (I hope we will replace the current primitive logging with good one (log4j is I think preferred with Hadoop), and then we can change the log level. Mark On Wed, May 23

Re: Memory exception in the mapper

2012-05-23 Thread Joey Krabacher
No problem, glad I could help. In our test environment I have lots of output and logging turned on, but as soon as it is on production all output and logging is reduced to the bare minimum. Basically, in production we only log caught exceptions. I would take it out unless you absolutely need it.

Re: Memory exception in the mapper

2012-05-23 Thread Mark Kerzner
Joey, that did the trick! Actually, I am writing to the log with System.out.println() statements, and I write about 12,000 lines, would that be a problem? I don't really need this output, so if you think it's inadvisable, I will remove that. Also, I hope that if I have not 6,000 maps but 12,000

Re: Memory exception in the mapper

2012-05-23 Thread Mark Kerzner
Arun, Actually CDH3 is Hadoop 0.20, but with .21 backported, so I am using 0.21 API whenever I can. Mark On Wed, May 23, 2012 at 9:40 PM, Mark Kerzner wrote: > Arun, > > I am running the latest CDH3, which I re-installed yesterday, so I believe > it is Hadoop 0.21. > > I have about 6000 maps em

Re: Memory exception in the mapper

2012-05-23 Thread Joey Krabacher
Mark, Have you tried tweaking the mapred.child.java.opts property in your mapred-site.xml? mapred.child.java.opts -Xmx2048m This might help. It looks like the fatal error came right after the log truncater fired off. Are you outputting anything to the logs manually, or have you looke

Re: Memory exception in the mapper

2012-05-23 Thread Mark Kerzner
Arun, I am running the latest CDH3, which I re-installed yesterday, so I believe it is Hadoop 0.21. I have about 6000 maps emitted, and 16 spills, and then I see Mapper cleanup() being called, after which I get this error 2012-05-23 20:22:58,108 FATAL org.apache.hadoop.mapred.Child: Error runnin

Re: Memory exception in the mapper

2012-05-23 Thread Mark Kerzner
Joey, my errors closely resembles this onein the archives. I can now be much more specific with the errors message, and it is quoted below. I tried -Xmx309

Re: Memory exception in the mapper

2012-05-23 Thread Arun C Murthy
What version of hadoop are you running? On May 23, 2012, at 12:16 PM, Mark Kerzner wrote: > Hi, all, > > I got the exception below in the mapper. I already have my global Hadoop > heap at 5 GB, but is there a specific other setting? Or maybe I should > troubleshoot for memory? > > But the same

Re: Right way to implement MR ?

2012-05-23 Thread Arun C Murthy
You might want to start with http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html. Arun On May 23, 2012, at 12:47 PM, samir das mohapatra wrote: > Hi All, > How to compare to input file In M/R Job. > let A Log file around 30GB >and B Log file size is around 60 GB > > I

Re: 3 machine cluster trouble

2012-05-23 Thread James Warren
Hi Pat - The setting for hadoop.tmp.dir is used both locally and on HDFS and therefore should be consistent across your cluster. http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir cheers, -James On Wed, May 23, 2012 at 3:44 PM, Pat Ferrel wrote: > I have a two machine cl

3 machine cluster trouble

2012-05-23 Thread Pat Ferrel
I have a two machine cluster and am adding a new machine. The new node has a different location for hadoop.tmp.dir than the other two nodes and refuses to start the datanode when started in the cluster. When I change the location pointed to by hadoop.tmp.dir to be the same on all machines it st

Multiple fs.FSInputChecker: Found checksum error .. because of load ?

2012-05-23 Thread Akshay Singh
Hi, I am trying to run few benchmarks on a small hadoop-cluster of 4 VMs (2 on 2 phyiscal hosts, each VM having 1 cpu core, 2GB ram, individual disk and Gbps bridged connectivity). I am using virtualbox as VMM. This workload reads good number of random small files (64MB each) concurrently f

Re: Right way to implement MR ?

2012-05-23 Thread Harsh J
Samir, You can use MultipleInputs for multiple forms of inputs per mapper (with their own input K/V types, but common output K/V types) with a common reduce-side join/compare. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html. On Thu,

Re: Memory exception in the mapper

2012-05-23 Thread Joey Krabacher
My experience with this sort of problem tells me one of two things and maybe both: 1. there are some optimizations to the code that can be made (variable re-creation inside of loops, etc.) 2. something has gone horribly wrong with the logic in the mapper. To troubleshoot I would output some log e

Memory exception in the mapper

2012-05-23 Thread Mark Kerzner
Hi, all, I got the exception below in the mapper. I already have my global Hadoop heap at 5 GB, but is there a specific other setting? Or maybe I should troubleshoot for memory? But the same application works in the IDE. Thank you! Mark *stderr logs* Exception in thread "Thread for syncLogs"

Re: Would I call DLL in the hadoop-version MR?

2012-05-23 Thread Vinod Kumar Vavilapalli
Yes, JNI should be good. Or you could go the hadoop pipes way - writing your MR apps in C++ and add the DLLs as dependencies distributed via distributed-cache. HTH, +Vinod On May 23, 2012, at 12:47 AM, jason Yang wrote: > Hi, > > Currently, I'm trying to rewrite an algorithm into a parallel

Re: Rules engines in Hadoop

2012-05-23 Thread Dave butlerdi
We use Jess and works a treat. On 23 May 2012 13:59, Peter Lin wrote: > It should be straight forward to embed a BRE like iLog JRules, JESS or > any other BRE written in Java. > > In terms of commercial offerings, no product currently includes a BRE > embedded or provide map/reducers that use BR

Re: Re: about hadoop lzo compression

2012-05-23 Thread yingnan.ma
Hi, Thank you for help! Best Regards Malone 2012-05-23 yingnan.ma 发件人: Harsh J 发送时间: 2012-05-23 18:24:14 收件人: common-user 抄送: 主题: Re: about hadoop lzo compression Malone, Right now it works despite error cause Pig hasn't had a need to read/write LZO data locally. Hence that erro

Re: Rules engines in Hadoop

2012-05-23 Thread Peter Lin
It should be straight forward to embed a BRE like iLog JRules, JESS or any other BRE written in Java. In terms of commercial offerings, no product currently includes a BRE embedded or provide map/reducers that use BRE. I won't bother mentioning the commercial hadoop offerings, since most people kn

Rules engines in Hadoop

2012-05-23 Thread Wilson Wayne - wwilso
I'm being asked by leaders in my company if there are any rules engines that have been integrated/accessible in Hadoop. They're meaning a BRE like iLog where there are many functions outside just the authorship of the rules such as versioning, role based access/security, rule lifecycle manageme

Re: Hadoop LZO compression

2012-05-23 Thread Harsh J
Hey Malone, We've already received your earlier mail. I've answered your question a short while ago over that thread: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201205.mbox/%3ccaocnvr1ssaf8hcbvbeseu_a-w8auqyswddrn+ovdu0sypuv...@mail.gmail.com%3e On Wed, May 23, 2012 at 3:59 PM, Y

Hadoop LZO compression

2012-05-23 Thread Yingnan Ma
Hi, I encounter a problem about when I install the LZO, after i install it, I found that it can run on Pig scripts and streaming scripts and when I check these jobs though jobtracker , it shows that *mapred.compress.map.output *true, *io.compression.codecs * org.apache.hadoop.io.compress.GzipCod

Re: Would I call a DLL written in C++ in the Hadoop-version MapReduce?

2012-05-23 Thread Harsh J
Hi, I don't see why you can't. Do you have issues in doing this when running on Windows? If so, can you detail out your issue? On Wed, May 23, 2012 at 1:08 PM, jason Yang wrote: > Hi, > > Currently, I'm trying to rewrite an algorithm into a parallel form. Since > the algorithm depends on lots of

Re: about hadoop lzo compression

2012-05-23 Thread Harsh J
Malone, Right now it works despite error cause Pig hasn't had a need to read/write LZO data locally. Hence that error is ignored (and LZO codec is skipped in the Pig front-end). However, the moment it may need to do that (for input sampling or whatever reason), you'll see a blocking failure. To a

about hadoop lzo compression

2012-05-23 Thread yingnan.ma
Hi, I encounter a problem about when I install the LZO, after i install it, I found that it can run on Pig scripts and streaming scripts and when I check these jobs though jobtracker , it shows that mapred.compress.map.output true, io.compression.codecs org.apache.hadoop.io.compress.GzipCodec, o

about hadoop lzo compression

2012-05-23 Thread yingnan.ma
Hi, I encounter a problem about when I install the LZO, after i install it, I found that it can run on Pig scripts and streaming scripts and when I check these jobs though jobtracker , it shows that mapred.compress.map.output true, io.compression.codecs org.apache.hadoop.io.compress.GzipCodec, o

Would I call DLL in the hadoop-version MR?

2012-05-23 Thread jason Yang
Hi, Currently, I'm trying to rewrite an algorithm into a parallel form. Since the algorithm depends on lots of third-party DLLs, I was wondering would I call the DLL written in C++ in the Hadoop-version MapReduce by using JNI? Thanks. -- YANG, Lin

Would I call a DLL written in C++ in the Hadoop-version MapReduce?

2012-05-23 Thread jason Yang
Hi, Currently, I'm trying to rewrite an algorithm into a parallel form. Since the algorithm depends on lots of third-party DLLs, I was wondering would I call the DLL written in C++ in the Hadoop-version MapReduce by using JNI? Thanks. -- YANG, Lin