On Wed, Jul 15, 2015 at 8:06 AM, Saeed Shahrivari
saeed.shahriv...@gmail.com wrote:
I use a simple map/reduce step in a Java/Spark program to remove
duplicated documents from a large (10 TB compressed) sequence file
containing some html pages. Here is the partial code:
JavaPairRDDBytesWritable
I use a simple map/reduce step in a Java/Spark program to remove duplicated
documents from a large (10 TB compressed) sequence file containing some
html pages. Here is the partial code:
JavaPairRDDBytesWritable, NullWritable inputRecords =
sc.sequenceFile(args[0], BytesWritable.class,
up with a proper fix. In the meantime, I recommend that you increase
your Akka frame size.
On Sat, Jan 3, 2015 at 8:51 PM, Saeed Shahrivari
saeed.shahriv...@gmail.com wrote:
I use the 1.2 version.
On Sun, Jan 4, 2015 at 3:01 AM, Josh Rosen rosenvi...@gmail.com wrote:
Which version
the Akka frame size (via the spark.akka.frameSize
configuration option).
On Sat, Jan 3, 2015 at 10:40 AM, Saeed Shahrivari
saeed.shahriv...@gmail.com wrote:
Hi,
I am trying to get the frequency of each Unicode char in a document
collection using Spark. Here is the code snippet that does