[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-08-04 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-668991161 @jkleckner I have never had a problem with the driver watching the executors. I think there was already a fallback mechanism there, but I never looked into the code for

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-07-21 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-662239237 @holdenk my JIRA username if sdehaes This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-07-14 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-658205704 > > > BTW, when do we receive a version changed from K8s? > > > > > > It happens when etcd compaction kicks in for example. On aws EKS I never saw this happening

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-07-14 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-658121676 > BTW, when do we receive a version changed from K8s? It happens when etcd compaction kicks in for example. On aws EKS I never saw this happening on EKS 1.14, but it

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-07-14 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-658116799 @ScrapCodes the code in the 2.4.x is significantly different from the code here. But we can reuse the same idea as here. I guess it has to be a new PR

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-05-08 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-625656337 @holdenk @dongjoon-hyun I have tested this code in production and it works. I have a couple of jobs that take roughly 4 hours to finish, these all failed without the fix

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-05-05 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-624148469 Ok reverting back to the old approach found the missing piece I think testing that out. Shared informers have the problem that you have to watch every pod in the

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-05-05 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-623913420 Ok I have tested this in production, there is something wrong with the code, went ahead and tried the sharedinformers approach. Will try that in production today. You can

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-05-04 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-623377481 @holdenk Maybe we should refactor this behavior using the sharedinformers. See the comment made here:

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-05-04 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-623345385 > Do you think we can have a unit test case for this, @stijndehaes ? The current tests completely mock out this behavior, see

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-04-30 Thread GitBox
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-621987328 > How do we feel about backporting this to Spark 2.4.6? I would very much like that, we ran into this using spark 2.4.x.