Before we dig too far into this, the thing which most quickly jumps out to me is groupByKey which could be causing some problems - whats the distribution of keys like? Try replacing the groupByKey with a count() and see if the pipeline works up until that stage. Also 1G of driver memory is also a bit small for something with 90 executors...
On Thu, Jan 21, 2016 at 2:40 PM, Arun Luthra <arun.lut...@gmail.com> wrote: > > > 16/01/21 21:52:11 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > 16/01/21 21:52:14 WARN MetricsSystem: Using default name DAGScheduler for > source because spark.app.id is not set. > > spark.yarn.driver.memoryOverhead is set but does not apply in client mode. > > 16/01/21 21:52:16 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > > 16/01/21 21:52:52 WARN MemoryStore: Not enough space to cache broadcast_4 > in memory! (computed 60.2 MB so far) > > 16/01/21 21:52:52 WARN MemoryStore: Persisting block broadcast_4 to disk > instead. > > [Stage 1:====================================================>(2260 + 7) / > 2262]16/01/21 21:57:24 WARN TaskSetManager: Lost task 1440.1 in stage 1.0 > (TID 4530, --): TaskCommitDenied (Driver denied task commit) for job: 1, > partition: 1440, attempt: 4530 > > [Stage 1:====================================================>(2260 + 6) / > 2262]16/01/21 21:57:27 WARN TaskSetManager: Lost task 1488.1 in stage 1.0 > (TID 4531, --): TaskCommitDenied (Driver denied task commit) for job: 1, > partition: 1488, attempt: 4531 > > [Stage 1:====================================================>(2261 + 4) / > 2262]16/01/21 21:57:39 WARN TaskSetManager: Lost task 1982.1 in stage 1.0 > (TID 4532, --): TaskCommitDenied (Driver denied task commit) for job: 1, > partition: 1982, attempt: 4532 > > 16/01/21 21:57:57 WARN TaskSetManager: Lost task 2214.0 in stage 1.0 (TID > 4482, --): TaskCommitDenied (Driver denied task commit) for job: 1, > partition: 2214, attempt: 4482 > > 16/01/21 21:57:57 WARN TaskSetManager: Lost task 2168.0 in stage 1.0 (TID > 4436, --): TaskCommitDenied (Driver denied task commit) for job: 1, > partition: 2168, attempt: 4436 > > > I am running with: > > spark-submit --class "myclass" \ > > --num-executors 90 \ > > --driver-memory 1g \ > > --executor-memory 60g \ > > --executor-cores 8 \ > > --master yarn-client \ > > --conf "spark.executor.extraJavaOptions=-verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ > > my.jar > > > There are 2262 input files totaling just 98.6G. The DAG is basically > textFile().map().filter().groupByKey().saveAsTextFile(). > > On Thu, Jan 21, 2016 at 2:14 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> Can you post more of your log? How big are the partitions? What is the >> action you are performing? >> >> On Thu, Jan 21, 2016 at 2:02 PM, Arun Luthra <arun.lut...@gmail.com> >> wrote: >> >>> Example warning: >>> >>> 16/01/21 21:57:57 WARN TaskSetManager: Lost task 2168.0 in stage 1.0 >>> (TID 4436, XXXXXXX): TaskCommitDenied (Driver denied task commit) for job: >>> 1, partition: 2168, attempt: 4436 >>> >>> >>> Is there a solution for this? Increase driver memory? I'm using just 1G >>> driver memory but ideally I won't have to increase it. >>> >>> The RDD being processed has 2262 partitions. >>> >>> Arun >>> >> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau