Re: Map Reduce Theory Question, getting OutOfMemoryError while reducing

2012-06-29 Thread Harsh J
Guojun is right, the reduce() inputs are buffered and read off of disk. You are in no danger there. On Fri, Jun 29, 2012 at 11:02 PM, GUOJUN Zhu wrote: > > If you are referring the iterable in the reducer, they are special and not > in the memory at all. Once the iterator pass a value, it is los

Re: Idle nodes with terasort and MRv2/YARN (0.23.1)

2012-06-29 Thread Trevor
Thanks, Arun. Switching to CapacityScheduler seems to have fixed much of the issue: TeraGen and TeraSort are now evenly distributed and run almost twice as fast. However, TeraValidate only ran on one node, leaving 3 completely idle (except for the AM). I browsed the block locations of the output pa

RE: Map Reduce Theory Question, getting OutOfMemoryError while reducing

2012-06-29 Thread GUOJUN Zhu
If you are referring the iterable in the reducer, they are special and not in the memory at all. Once the iterator pass a value, it is lost and you cannot recover it. There is nothing like linkedlist in behind. Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial E

RE: Map Reduce Theory Question, getting OutOfMemoryError while reducing

2012-06-29 Thread Berry, Matt
I was actually quite curious as to how Hadoop was managing to get all of the records into the Iterable in the first place. I thought they were using a very specialized object that implements Iterable, but a heap dump shows they're likely just using a LinkedList. All I was doing was duplicating

Re: Map Reduce Theory Question, getting OutOfMemoryError while reducing

2012-06-29 Thread Harsh J
Hey Matt, As far as I can tell, Hadoop isn't at fault here truly. If your issue is that you collect in a list before you store, you should focus on that and just avoid collecting it completely. Why don't you serialize as you receive, if the incoming order is already taken care of? As far as I can

What's the best way to compress a folder in hadoop?

2012-06-29 Thread Félix López
The folder contains files with text and other folders with text files. The text is not key/value, it's just text. Something like this: Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dumm... I'm thinking about 3 options: Firs