virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451105473
> If someone removes the processQueueMessages itself from the sendHeartbeat,
then also this test should fail or atleast some should
+1
--
This is an automated message from
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451104757
> It is inducing a race by nextHeartbeatTime
Absolutely, that's what I thought too. But yes you are right, other than
adding sleeps, it's bit tricky to reproduce. But yeah our
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451077691
> If my intent is just for processQueueMessages, I will expose and just
shoot that directly, rather than doing the whole loop.
That would also work but as part of the test, we
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451021126
The above patch, when applied with this PR changes, the test is passing
consistently.
Whereas without PR changes, the test is consistently failing (failed 7 times
locally without
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451015537
Another way I am able to repro consistently:
```
diff --git
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451008842
Though it's difficult to reproduce, I thought this utility would help test
to ensure 100% that namenode has definitely received report as part of
`ReportBadBlockAction#reportTo`.
--
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450990755
I tried multiple cases and for some sleeps, I am able to repro, only
sometimes.
The only way I am able to consistently repro failure is by applying this
patch:
```
diff
virajjasani commented on PR #5445:
URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450896182
@ayushtkn @tomscut could you please review this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL