RE: Spill Failed when io.sort.mb is increased

2012-08-06 Thread Sven Groot
Hi Arpit, I'm uncertain as to the exact cause of the exception (maybe an integer overflow somewhere?) but I'd just like to point out that in general, increasing io.sort.mb to such a high value is not necessarily a good thing. Sorting is an expensive operation, and uses non-linear time complexity.

Spill Failed when io.sort.mb is increased

2012-08-06 Thread Arpit Wanchoo
Hi I am facing this issue of spill failed when I increase the io.sort.mb to 1500 or 2000 It runs fine with 500 or 1000 but I get some spilled records ( 780 million spilled out of total 5.3 billion map output records). I configured 9GB of VM to each mapper and configured 4 mapper on each node

processing multiple blocks by single JVM

2012-08-06 Thread Radim Kolar
In yarn, JVM does not exit after processing one HDFS block. If another block is being processed by same JVM, it is called setup() again? I discovered that my setup() method needs 15 seconds to execute.

Re: Handling files with unclear boundaries

2012-08-06 Thread Mohammad Tariq
Thanku guys. Syed : thanku for the pointer Regards, Mohammad Tariq On Mon, Aug 6, 2012 at 11:54 PM, syed kather wrote: > Hi tariq , > >Have a look on this link which can guide you .. > There was discussion happen previously for the same type of issue > > search-hadoop.com/m/ydCoSysmTd1

Re: Compare Hadoop and Pig Map\Reduce

2012-08-06 Thread syed kather
This is very much useful guys . And informative too . Now i am clear Syed Abdul kather send from Samsung S3 On Jul 31, 2012 11:11 PM, "Manoj Babu" wrote: > Thanks Abhishek. > > Cheers! > Manoj. > > > > On Tue, Jul 31, 2012 at 10:43 PM, Abhishek Shivkumar < > abhisheksgum...@gmail.com> wrote: > >

Re: Handling files with unclear boundaries

2012-08-06 Thread Manoj Khangaonkar
Hi, I think you might need to extend FileInputFormat ( or one of its derived classes) as well as implement a RecordReader. regards On Mon, Aug 6, 2012 at 8:30 AM, Mohammad Tariq wrote: > Hello list, > > I need some guidance on how to handle files where we don't have > any proper delimiter

Re: Handling files with unclear boundaries

2012-08-06 Thread rahul p
Hi Tariq, Can you accept my gtalk request. On Mon, Aug 6, 2012 at 11:30 PM, Mohammad Tariq wrote: > Hello list, > > I need some guidance on how to handle files where we don't have > any proper delimiters or record boundaries. Actually I am trying to > process a set of file that are totally

Handling files with unclear boundaries

2012-08-06 Thread Mohammad Tariq
Hello list, I need some guidance on how to handle files where we don't have any proper delimiters or record boundaries. Actually I am trying to process a set of file that are totally alien to me (SAS XPT files) through MR. But one thing that is always fixed is that each time I have to read 10

Re: Keeping Map-Tasks alive

2012-08-06 Thread Yaron Gonen
Thanks. As I see it, it cannot be done in the MapReduce 1 framework without changing TaskTracker and JobTracker. Problem is I'm not familiar at all with YARN... it might be possible there. Thanks again! On Mon, Aug 6, 2012 at 1:21 AM, Harsh J wrote: > Ah, my bad - I skipped over the K-Means part