holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-630938158
Sounds good, happy to help coordinate with any reviews needed. Would like us
to be able to start using this in 3.1 :)
--
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-630365923
Merged to master (target 3.1). Let me know if you're interested in doing any
of the follow-ups @ngone51 / @prakharjain09 otherwise I'll get that started
after the shuffle block
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-630363255
LGTM. Thanks for working on this @prakharjain09 I know how frustrating
debugging test-only issues that only show up in Jenkins can be.
Thanks for taking the time to review
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-629389491
The hive test failure is probably a flaky test.
Jenkins retest this please
This is an automated message from
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-628933224
Seeing weird network issues on Jenkins.
Jenkins retest this please.
This is an automated message from the Apa
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-628815289
From the dev@ list post it seems like the maven fetch issue is known and
being worked on. You might want to cherry-pick the set -x I've got in though as
well so we can make sure
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-628792196
Could you add the code to avoide the infinite retry loop on error & also
checking thread interrupted incase something else swallows the thread
interruption exception in the futu
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-625921906
Ok running just those three tests alone in SBT on my local host doesn't show
any failures on this branch. I'm not seeing the memory error we had earlier
that would have explaine
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-625665354
Jenkins retest this please.
I feel like I’ve seen these particular tests fail before; let’s dig into
what’s causing them to fail.
---
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-625664892
Right so just send SIGPWR to the worker. Are you saying in standalone mode
you have one worker with multiple executors and you want to decommission a
specific executor? Regardle
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-625514065
Jenkins retest this please
This is an automated message from the Apache Git Service.
To respond to the message,
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-625457201
Jenkins, retest this please
This is an automated message from the Apache Git Service.
To respond to the message,
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-625457036
So SIGPWR doesn't require immediate shutdown, and it's trappable.
This is an automated message from the Apache G
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-624346604
Jenkins retest this please.
Jenkins add to whitelist
Also @prakharjain09 you can always just push another commit to re-trigger
the tests (you might be able to ask Jenkins I
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-623734294
Ok so I think the CLI test is flaky (looking at it it's awaint a future and
timing out so that's not surprising). If you can re-enable your tests
@prakharjain09 I think the new
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-623593170
I think those three are not related. Jenkins retest this please
This is an automated message from the Apache Git
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-622475169
Jenkins retest this please
This is an automated message from the Apache Git Service.
To respond to the message,
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-622203530
So I tried it out locally, and if instead of disabling the entire stop (we
want to keep the interrupt call), we call interrupt & inside of the catch block
with interrupt set sto
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-622197638
Oh yeah to be clear the line numbers inside of block manager are different
because I was playing with some debugging, but the rest of it should be fairly
direct.
-
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-62219
Looks like the tests are passing but were still seeing the executor hang, I
did a jstack dump on a local run and I got:
> 2020-04-30 17:44:40
> Full thread dump OpenJDK
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-622111857
Ok so it isn't (currently) timing out but the OOM is a bit worrying. Jenkins
retest this please
This is an auto
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-621945031
@ScrapCodes: In the future (and I've filed a JIRA for this), for
non-voluntary scale downs we can try and prioritize blocks, but I think this is
a solid first step :)
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-621507928
@scrapcodes It depends on your cluster, but it could be anywhere from 1
second to several hours. Generally, though I'd expect most situations to be in
the minutes time frame. Ev
holdenk commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620077765
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
24 matches
Mail list logo