Re: Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Saeed Shahrivari
Yes there is. But the RDD is more than 10 TB and compression does not help. On Wed, Jul 15, 2015 at 8:36 PM, Ted Yu yuzhih...@gmail.com wrote: bq. serializeUncompressed() Is there a method which enables compression ? Just wondering if that would reduce the memory footprint. Cheers On

Re: Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Ted Yu
bq. serializeUncompressed() Is there a method which enables compression ? Just wondering if that would reduce the memory footprint. Cheers On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari saeed.shahriv...@gmail.com wrote: I use a simple map/reduce step in a Java/Spark program to remove

Strange Error: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-15 Thread Saeed Shahrivari
I use a simple map/reduce step in a Java/Spark program to remove duplicated documents from a large (10 TB compressed) sequence file containing some html pages. Here is the partial code: JavaPairRDDBytesWritable, NullWritable inputRecords = sc.sequenceFile(args[0], BytesWritable.class,