from:"ansriniv"

parallel Reduce within a key

2014-06-20 Thread ansriniv

Hi, I am on Spark 0.9.0 I have a 2 node cluster (2 worker nodes) with 16 cores on each node (so, 32 cores in the cluster). I have an input rdd with 64 partitions. I am running "sc.mapPartitions(...).reduce(...)" I can see that I get full parallelism on the mapper (all my 32 cores are busy simu

getPreferredLocations

2014-05-29 Thread ansriniv

I am building my own custom RDD class. 1) Is there a guarantee that a partition will only be processed on a node which is in the "getPreferredLocations" set of nodes returned by the RDD ? 2) I am implementing this custom RDD in Java and plan to extend JavaRDD. However, I dont see a "getPreferred

Re: Spark hook to create external process

2014-05-29 Thread ansriniv

Hi Matei, Thanks for the reply. I would like to avoid having to spawn these external processes every time during the processing of the task to reduce task latency. I'd like these to be pre-spawned as much as possible - tying them to lifecycle of corresponding threadpool thread would simplify mana

Spark hook to create external process

2014-05-29 Thread ansriniv

I have a requirement where for every Spark executor threadpool thread, I need to launch an associated external process. My job will consist of some processing in the Spark executor thread and some processing by its associated external process with the 2 communicating via some IPC mechanism. Is th

parallel Reduce within a key

getPreferredLocations

Re: Spark hook to create external process

Spark hook to create external process

4 matches

Site Navigation

Mail list logo

Footer information