Is it possible to jstack the executors and see where they are hanging? On Thu, Mar 26, 2015 at 2:02 PM, Jon Chase <jon.ch...@gmail.com> wrote:
> Spark 1.3.0 on YARN (Amazon EMR), cluster of 10 m3.2xlarge (8cpu, 30GB), > executor memory 20GB, driver memory 10GB > > I'm using Spark SQL, mainly via spark-shell, to query 15GB of data spread > out over roughly 2,000 Parquet files and my queries frequently hang. Simple > queries like "select count(*) from ..." on the entire data set work ok. > Slightly more demanding ones with group by's and some aggregate functions > (percentile_approx, avg, etc.) work ok as well, as long as I have some > criteria in my where clause to keep the number of rows down. > > Once I hit some limit on query complexity and rows processed, my queries > start to hang. I've left them for up to an hour without seeing any > progress. No OOM's either - the job is just stuck. > > I've tried setting spark.sql.shuffle.partitions to 400 and even 800, but > with the same results: usually near the end of the tasks (like 780 of 800 > complete), progress just stops: > > 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 788.0 in > stage 1.0 (TID 1618) in 800 ms on > ip-10-209-22-211.eu-west-1.compute.internal (748/800) > 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 793.0 in > stage 1.0 (TID 1623) in 622 ms on > ip-10-105-12-41.eu-west-1.compute.internal (749/800) > 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 797.0 in > stage 1.0 (TID 1627) in 616 ms on ip-10-90-2-201.eu-west-1.compute.internal > (750/800) > 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 799.0 in > stage 1.0 (TID 1629) in 611 ms on ip-10-90-2-201.eu-west-1.compute.internal > (751/800) > 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 795.0 in > stage 1.0 (TID 1625) in 669 ms on > ip-10-105-12-41.eu-west-1.compute.internal (752/800) > > ^^^^^^^ this is where it stays forever > > Looking at the Spark UI, several of the executors still list active > tasks. I do see that the Shuffle Read for executors that don't have any > tasks remaining is around 100MB, whereas it's more like 10MB for the > executors that still have tasks. > > The first stage, mapPartitions, always completes fine. It's the second > stage (takeOrdered), that hangs. > > I've had this issue in 1.2.0 and 1.2.1 as well as 1.3.0. I've also > encountered it when using JSON files (instead of Parquet). > > Thoughts? I'm blocked on using Spark SQL b/c most of the queries I do are > having this issue. >