[ https://issues.apache.org/jira/browse/SPARK-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reopened SPARK-19532: ------------------------------- > [Core]`DataStreamer for file` threads of DFSOutputStream leak if set > `spark.speculation` to true > ------------------------------------------------------------------------------------------------ > > Key: SPARK-19532 > URL: https://issues.apache.org/jira/browse/SPARK-19532 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.1.0 > Reporter: StanZhai > Priority: Critical > > When set `spark.speculation` to true, from thread dump page of Executor of > WebUI, I found that there are about 1300 threads named "DataStreamer for > file > /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_000069_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet" > in TIMED_WAITING state. > {code} > java.lang.Object.wait(Native Method) > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564) > {code} > The off-heap memory exceeds a lot until Executor exited with OOM exception. > This problem occurs only when writing data to the Hadoop(tasks may be killed > by Executor during writing). > Could this be related to [https://issues.apache.org/jira/browse/HDFS-9812]? > The version of Hadoop is 2.6.4. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org