Yes there is.
But the RDD is more than 10 TB and compression does not help.
On Wed, Jul 15, 2015 at 8:36 PM, Ted Yu yuzhih...@gmail.com wrote:
bq. serializeUncompressed()
Is there a method which enables compression ?
Just wondering if that would reduce the memory footprint.
Cheers
On
bq. serializeUncompressed()
Is there a method which enables compression ?
Just wondering if that would reduce the memory footprint.
Cheers
On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari
saeed.shahriv...@gmail.com wrote:
I use a simple map/reduce step in a Java/Spark program to remove
I use a simple map/reduce step in a Java/Spark program to remove duplicated
documents from a large (10 TB compressed) sequence file containing some
html pages. Here is the partial code:
JavaPairRDDBytesWritable, NullWritable inputRecords =
sc.sequenceFile(args[0], BytesWritable.class,