stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-668991161
@jkleckner I have never had a problem with the driver watching the
executors. I think there was already a fallback mechanism there, but I never
looked into the code for
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-662239237
@holdenk my JIRA username if sdehaes
This is an automated message from the Apache Git Service.
To respond
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-658205704
> > > BTW, when do we receive a version changed from K8s?
> >
> >
> > It happens when etcd compaction kicks in for example. On aws EKS I never
saw this happening
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-658121676
> BTW, when do we receive a version changed from K8s?
It happens when etcd compaction kicks in for example. On aws EKS I never saw
this happening on EKS 1.14, but it
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-658116799
@ScrapCodes the code in the 2.4.x is significantly different from the code
here. But we can reuse the same idea as here. I guess it has to be a new PR
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-625656337
@holdenk @dongjoon-hyun I have tested this code in production and it works.
I have a couple of jobs that take roughly 4 hours to finish, these all failed
without the fix
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-624148469
Ok reverting back to the old approach found the missing piece I think
testing that out.
Shared informers have the problem that you have to watch every pod in the
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-623913420
Ok I have tested this in production, there is something wrong with the code,
went ahead and tried the sharedinformers approach. Will try that in production
today. You can
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-623377481
@holdenk Maybe we should refactor this behavior using the sharedinformers.
See the comment made here:
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-623345385
> Do you think we can have a unit test case for this, @stijndehaes ?
The current tests completely mock out this behavior, see
stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-621987328
> How do we feel about backporting this to Spark 2.4.6?
I would very much like that, we ran into this using spark 2.4.x.
11 matches
Mail list logo