Hi, In spark submit, I specify --master yarn-client. When I go to executors in UI I do see all the 12 different executors assigned. But for the stage when I drill down to Tasks I saw only 8 tasks with index 0-7.
I ran again increasing the number of executors as 15 and I now see 12 tasks for the stage. Still like to understand even if 12 executors were available why there was only 8 tasks for the stage. Thanks, Pradeep > On Jul 26, 2016, at 8:46 AM, Jacek Laskowski <ja...@japila.pl> wrote: > > Hi, > > Where's this yarn-client mode specified? When you said "However, when > I run the job I see that the stage which reads the directory has only > 8 tasks." -- how do you see 8 tasks for a stage? It appears you're in > local[*] mode on a 8-core machine (like me) and that's why I'm asking > such basic questions. > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > >> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pradeep.mi...@mail.com> wrote: >> More of jars and files and app name. It runs on yarn-client mode. >> >> Thanks, >> Pradeep >> >>> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote: >>> >>> Hi, >>> >>> What's "<all other stuff>"? What master URL do you use? >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark http://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >>> >>>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pradeep.mi...@mail.com> wrote: >>>> Hi All, >>>> >>>> I have a directory which has 12 files. I want to read the entire file so I >>>> am reading it as wholeTextFiles(dirpath, numPartitions). >>>> >>>> I run spark-submit as <all other stuff> --num-executors 12 >>>> --executor-cores 1 and numPartitions 12. >>>> >>>> However, when I run the job I see that the stage which reads the directory >>>> has only 8 tasks. So some task reads more than one file and takes twice >>>> the time. >>>> >>>> What can I do that the files are read by 12 tasks I.e one file per task. >>>> >>>> Thanks, >>>> Pradeep >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org