Some tasks take a long time to find local block

patrick256 Thu, 17 Dec 2015 05:34:16 -0800

I'm using Spark 1.5.2 and my RDD has 512 equally sized partitions and is 100%
cached in memory across 512 executors.


I have a filter-map-collect job with 512 tasks. Sometimes this job completes
sub-second. On other occasions when I run it 50% of the tasks complete
sub-second, 45% of the tasks take 10 seconds and 5% of the tasks take 20
seconds. 

Here is the log from an executor where the task took 20 seconds: 

15/12/16 09:44:37 INFO executor.CoarseGrainedExecutorBackend: Got assigned
task 5312 
15/12/16 09:44:37 INFO executor.Executor: Running task 215.0 in stage 17.0
(TID 5312) 
15/12/16 09:44:37 INFO broadcast.TorrentBroadcast: Started reading broadcast
variable 10 
15/12/16 09:44:37 INFO storage.MemoryStore: ensureFreeSpace(1777) called
with curMem=908793307, maxMem=5927684014 
15/12/16 09:44:37 INFO storage.MemoryStore: Block broadcast_10_piece0 stored
as bytes in memory (estimated size 1777.0 B, free 4.7 GB) 
15/12/16 09:44:37 INFO broadcast.TorrentBroadcast: Reading broadcast
variable 10 took 186 ms 
15/12/16 09:44:37 INFO storage.MemoryStore: ensureFreeSpace(3272) called
with curMem=908795084, maxMem=5927684014 
15/12/16 *09:44:37* INFO storage.MemoryStore: Block broadcast_10 stored as
values in memory (estimated size 3.2 KB, free 4.7 GB) 
15/12/16 *09:44:57* INFO storage.BlockManager: Found block rdd_5_215 locally 
15/12/16 09:44:57 INFO executor.Executor: Finished task 215.0 in stage 17.0
(TID 5312). 2074 bytes result sent to driver 

So it appears the 20 seconds is spent finding the local block. 

Since the lag is always either exactly 10 seconds or exactly 20 seconds I
suspect it's due to a 10 second timeout on some listener, or something like
that. If that is true then I guess my options are either find out why it's
timing out and fix it or make the timeout shorter so it tries more
frequently. 

Any advice welcome. Many thanks in advance, 

Pat



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Some-tasks-take-a-long-time-to-find-local-block-tp25727.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Some tasks take a long time to find local block

Reply via email to