This one would give you a better understanding http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors
Thanks Best Regards On Wed, Nov 26, 2014 at 10:32 PM, Akhil Das <[email protected]> wrote: > 1. On HDFS files are treated as ~64mb in block size. When you put the same > file in local file system (ext3/ext4) it will be treated as different (in > your case it looks like ~32mb) and that's why you are seeing 9 output files. > > 2. You could set *num-executors *to increase the number of executor > processes. > > Thanks > Best Regards > > On Wed, Nov 26, 2014 at 5:54 PM, Praveen Sripati <[email protected] > > wrote: > >> Hi, >> >> I am running Spark in the stand alone mode. >> >> 1) I have a file of 286MB in HDFS (block size is 64MB) and so is split >> into 5 blocks. When I have the file in HDFS, 5 tasks are generated and so 5 >> files in the output. My understanding is that there will be a separate >> partition for each block and there will be a separate task for each >> partition. This makes sense why I see 5 files in the output. >> >> When I put the same file in local file system (not HDFS), I see 9 files >> in the output. I am curious why it is 9? >> >> 2) With the file in HDFS and local file system, I see a single >> CoarseGrainedExecutorBackend when I run the jps command. Why is it one >> executor process and how do we configure the number of executor process? >> >> Thanks, >> Praveen >> > >
