[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451105473 > If someone removes the processQueueMessages itself from the sendHeartbeat, then also this test should fail or atleast some should +1 -- This is an automated message from

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451104757 > It is inducing a race by nextHeartbeatTime Absolutely, that's what I thought too. But yes you are right, other than adding sleeps, it's bit tricky to reproduce. But yeah our

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451077691 > If my intent is just for processQueueMessages, I will expose and just shoot that directly, rather than doing the whole loop. That would also work but as part of the test, we

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451021126 The above patch, when applied with this PR changes, the test is passing consistently. Whereas without PR changes, the test is consistently failing (failed 7 times locally without

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451015537 Another way I am able to repro consistently: ``` diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1451008842 Though it's difficult to reproduce, I thought this utility would help test to ensure 100% that namenode has definitely received report as part of `ReportBadBlockAction#reportTo`. --

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450990755 I tried multiple cases and for some sleeps, I am able to repro, only sometimes. The only way I am able to consistently repro failure is by applying this patch: ``` diff

[GitHub] [hadoop] virajjasani commented on pull request #5445: HDFS-16938. Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-03-01 Thread via GitHub
virajjasani commented on PR #5445: URL: https://github.com/apache/hadoop/pull/5445#issuecomment-1450896182 @ayushtkn @tomscut could you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL