[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guoqiang Li updated SPARK-2156: ------------------------------- Comment: was deleted (was: [~pwend...@gmail.com]) > When the size of serialized results for one partition is slightly smaller > than 10MB (the default akka.frameSize), the execution blocks > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-2156 > URL: https://issues.apache.org/jira/browse/SPARK-2156 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.9.1, 1.0.0 > Environment: AWS EC2 1 master 2 slaves with the instance type of > r3.2xlarge > Reporter: Chen Jin > Priority: Critical > Fix For: 1.0.1 > > Original Estimate: 504h > Remaining Estimate: 504h > > I have done some experiments when the frameSize is around 10MB . > 1) spark.akka.frameSize = 10 > If one of the partition size is very close to 10MB, say 9.97MB, the execution > blocks without any exception or warning. Worker finished the task to send the > serialized result, and then throw exception saying hadoop IPC client > connection stops (changing the logging to debug level). However, the master > never receives the results and the program just hangs. > But if sizes for all the partitions less than some number btw 9.96MB amd > 9.97MB, the program works fine. > 2) spark.akka.frameSize = 9 > when the partition size is just a little bit smaller than 9MB, it fails as > well. > This bug behavior is not exactly what spark-1112 is about. -- This message was sent by Atlassian JIRA (v6.2#6252)