Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ashish Rangole
heavy? Thanks, Ewan -Original Message- From: andrew.row...@thomsonreuters.com [mailto: andrew.row...@thomsonreuters.com] Sent: 27 August 2015 11:53 To: user@spark.apache.org Subject: Driver running out of memory - caused by many tasks? I have a spark v.1.4.1 on YARN job

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
happening? Andrew From: Ashish Rangole Date: Thursday, 27 August 2015 15:24 To: Andrew Rowson Cc: user, ewan.le...@realitymine.com Subject: Re: Driver running out of memory - caused by many tasks? I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like

Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
I have a spark v.1.4.1 on YARN job where the first stage has ~149,000 tasks (it’s reading a few TB of data). The job itself is fairly simple - it’s just getting a list of distinct values: val days = spark .sequenceFile(inputDir, classOf[KeyClass], classOf[ValueClass])

RE: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ewan Leith
...@thomsonreuters.com] Sent: 27 August 2015 11:53 To: user@spark.apache.org Subject: Driver running out of memory - caused by many tasks? I have a spark v.1.4.1 on YARN job where the first stage has ~149,000 tasks (it’s reading a few TB of data). The job itself is fairly simple - it’s just getting

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
...@thomsonreuters.com] Sent: 27 August 2015 11:53 To: user@spark.apache.org Subject: Driver running out of memory - caused by many tasks? I have a spark v.1.4.1 on YARN job where the first stage has ~149,000 tasks (it’s reading a few TB of data). The job itself is fairly simple - it’s just