[GitHub] [spark] yanxiaole commented on pull request #29392: [SPARK-32574][CORE] Race condition in FsHistoryProvider listing iteration

2020-08-08 Thread GitBox
yanxiaole commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-670961239 > What code that modifies this object is also synchronizing on this? Sorry, I'll update them. @srowen , do you think I should add synchronizing at every write and del

[GitHub] [spark] yanxiaole commented on pull request #29392: [SPARK-32574][CORE] Race condition in FsHistoryProvider listing iteration

2020-08-08 Thread GitBox
yanxiaole commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-670993081 It happened when data structure getting modified, mainly `delete` caused the problem. Take `LevelDB` implementation for example, in its iterator's `next` function it wi

[GitHub] [spark] yanxiaole commented on pull request #29392: [SPARK-32574][CORE] Race condition in FsHistoryProvider listing iteration

2020-08-09 Thread GitBox
yanxiaole commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-671067561 Thanks @srowen , I just updated a new commit to lock the delete operation. This is an automated message fr

[GitHub] [spark] yanxiaole commented on pull request #29392: [SPARK-32574][CORE] Race condition in FsHistoryProvider listing iteration

2020-08-09 Thread GitBox
yanxiaole commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-671069843 I have pasted two stack traces below as they happened in two different functions. The line numbers are slightly different with the code in `master` because I'm using `branch-3

[GitHub] [spark] yanxiaole commented on pull request #29392: [SPARK-32574][CORE] Race condition in FsHistoryProvider listing iteration

2020-08-11 Thread GitBox
yanxiaole commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-672459512 Hi @zhouyejoe , Yes, you are right. I have checked the pool definition, and `checkForLogs()` and `cleanLogs()` won't run at same time, the only possible is the replay tasks

[GitHub] [spark] yanxiaole commented on pull request #29392: [SPARK-32574][CORE] Race condition in FsHistoryProvider listing iteration

2020-08-11 Thread GitBox
yanxiaole commented on pull request #29392: URL: https://github.com/apache/spark/pull/29392#issuecomment-672630663 the filter is called after the assignment of variable `stale`, but the race condition is happened in the assignment, the `asScala.toList`. ---