[ https://issues.apache.org/jira/browse/SPARK-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074044#comment-14074044 ]
Guoqiang Li edited comment on SPARK-2681 at 7/25/14 4:39 AM: ------------------------------------------------------------- OK, but may take some time. was (Author: gq): OK, but have some time. > Spark can hang when fetching shuffle blocks > ------------------------------------------- > > Key: SPARK-2681 > URL: https://issues.apache.org/jira/browse/SPARK-2681 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Guoqiang Li > Priority: Blocker > > executor log : > {noformat} > 14/07/24 22:56:52 INFO executor.CoarseGrainedExecutorBackend: Got assigned > task 53628 > 14/07/24 22:56:52 INFO executor.Executor: Running task ID 53628 > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally > 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally > 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Updating epoch to 236 > and clearing cache > 14/07/24 22:56:52 INFO spark.CacheManager: Partition rdd_51_83 not found, > computing it > 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Don't have map outputs > for shuffle 9, fetching them > 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker > actor = > Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] > 14/07/24 22:56:53 INFO spark.MapOutputTrackerWorker: Got the output locations > 14/07/24 22:56:53 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRequestSize: 10066329 > 14/07/24 22:56:53 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 > non-empty blocks out of 1024 blocks > 14/07/24 22:56:53 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote > fetches in 8 ms > 14/07/24 22:56:55 INFO storage.MemoryStore: ensureFreeSpace(28728) called > with curMem=920109320, maxMem=4322230272 > 14/07/24 22:56:55 INFO storage.MemoryStore: Block rdd_51_83 stored as values > to memory (estimated size 28.1 KB, free 3.2 GB) > 14/07/24 22:56:55 INFO storage.BlockManagerMaster: Updated info of block > rdd_51_83 > 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_189_83 not found, > computing it > 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Don't have map outputs > for shuffle 28, fetching them > 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker > actor = > Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395] > 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Got the output locations > 14/07/24 22:56:55 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRequestSize: 10066329 > 14/07/24 22:56:55 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty > blocks out of 1024 blocks > 14/07/24 22:56:55 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote > fetches in 0 ms > 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_50_83 not found, > computing it > 14/07/24 22:56:55 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: > 50331648, targetRequestSize: 10066329 > 14/07/24 22:56:55 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 > non-empty blocks out of 1024 blocks > 14/07/24 22:56:55 INFO > storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote > fetches in 4 ms > 14/07/24 22:57:09 INFO network.ConnectionManager: Removing > ReceivingConnection to ConnectionManagerId(tuan221,51153) > 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection > to ConnectionManagerId(tuan221,51153) > 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection > to ConnectionManagerId(tuan221,51153) > 14/07/24 23:05:07 INFO network.ConnectionManager: Key not valid ? > sun.nio.ch.SelectionKeyImpl@3dcc1da1 > 14/07/24 23:05:07 INFO network.ConnectionManager: Removing SendingConnection > to ConnectionManagerId(tuan211,43828) > 14/07/24 23:05:07 INFO network.ConnectionManager: key already cancelled ? > sun.nio.ch.SelectionKeyImpl@3dcc1da1 > java.nio.channels.CancelledKeyException > at > org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363) > at > org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116) > 14/07/24 23:05:07 INFO network.ConnectionManager: Removing > ReceivingConnection to ConnectionManagerId(tuan211,43828) > 14/07/24 23:05:07 ERROR network.ConnectionManager: Corresponding > SendingConnectionManagerId not found > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)