[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Description: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test awaits 15-25 TcpDiscoveryConnectionCheckMessage during failure detection timeout (1ms). But sometimes test fails because there are only 14 messages. TcpDiscoveryConnectionCheckMessage is sent by timeout (500ms) if no other messages were sent in the period. The test expects if no activity on the cluster then ~20 msgs should be sent. But actually there is TcpDiscoveryMetricsUpdateMessage is sent every 2000ms. Within the failure detection period it can be sent 6 times, and then reset the timer 6 times. In corner case (20 - 6 = 14) msgs of the check message will be sent. was: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. Measuring the number of messages is a very unstable practice. We should make the boundaries more moderate. > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-132, ise > Fix For: 2.18 > > > Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > sometimes fails. The test awaits 15-25 TcpDiscoveryConnectionCheckMessage > during failure detection timeout (1ms). But sometimes test fails because > there are only 14 messages. > TcpDiscoveryConnectionCheckMessage is sent by timeout (500ms) if no other > messages were sent in the period. The test expects if no activity on the > cluster then ~20 msgs should be sent. But actually there is > TcpDiscoveryMetricsUpdateMessage is sent every 2000ms. > Within the failure detection period it can be sent 6 times, and then reset > the timer 6 times. In corner case (20 - 6 = 14) msgs of the check message > will be sent. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Description: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. Measuring the number of messages is a very unstable practice. We should make the boundaries more moderate. was: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. Measuring amount of messages is very unstable practice. > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-132, ise > Fix For: 2.18 > > > Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > sometimes fails. The test checks that during failure detection timeout there > are 15-25 ping messages between two neighbors nodes. But sometimes test > fails because there are only 14 messages. > Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending > of the message resets a timer used for sending the > TcpDiscoveryConnectionCheckMessage message. > Then in the default failure detection period (1ms) it can be sent 6 > times. It's OK to await 20 times of the check message, but the reset can drop > 6 sends, and we will get 14 messages. > Measuring the number of messages is a very unstable practice. We should make > the boundaries more moderate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Description: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. Measuring amount of messages is very unstable practice. was: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-132, ise > Fix For: 2.18 > > > Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > sometimes fails. The test checks that during failure detection timeout there > are 15-25 ping messages between two neighbors nodes. But sometimes test > fails because there are only 14 messages. > Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending > of the message resets a timer used for sending the > TcpDiscoveryConnectionCheckMessage message. > Then in the default failure detection period (1ms) it can be sent 6 > times. It's OK to await 20 times of the check message, but the reset can drop > 6 sends, and we will get 14 messages. > Measuring amount of messages is very unstable practice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Description: Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage sometimes fails. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. was: TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage test is sometimes fail. The test checks that during failure detection timeout there are 15-25 ping messages between two neighbors nodes. But sometimes test fails because there are only 14 messages. Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending of the message resets a timer used for sending the TcpDiscoveryConnectionCheckMessage message. Then in the default failure detection period (1ms) it can be sent 6 times. It's OK to await 20 times of the check message, but the reset can drop 6 sends, and we will get 14 messages. > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-132, ise > Fix For: 2.18 > > > Test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > sometimes fails. The test checks that during failure detection timeout there > are 15-25 ping messages between two neighbors nodes. But sometimes test > fails because there are only 14 messages. > Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending > of the message resets a timer used for sending the > TcpDiscoveryConnectionCheckMessage message. > Then in the default failure detection period (1ms) it can be sent 6 > times. It's OK to await 20 times of the check message, but the reset can drop > 6 sends, and we will get 14 messages. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Ignite Flags: (was: Docs Required,Release Notes Required) > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > test is sometimes fail. The test checks that during failure detection timeout > there are 15-25 ping messages between two neighbors nodes. But sometimes > test fails because there are only 14 messages. > Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending > of the message resets a timer used for sending the > TcpDiscoveryConnectionCheckMessage message. > Then in the default failure detection period (1ms) it can be sent 6 > times. It's OK to await 20 times of the check message, but the reset can drop > 6 sends, and we will get 14 messages. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Fix Version/s: 2.18 > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-132, ise > Fix For: 2.18 > > > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > test is sometimes fail. The test checks that during failure detection timeout > there are 15-25 ping messages between two neighbors nodes. But sometimes > test fails because there are only 14 messages. > Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending > of the message resets a timer used for sending the > TcpDiscoveryConnectionCheckMessage message. > Then in the default failure detection period (1ms) it can be sent 6 > times. It's OK to await 20 times of the check message, but the reset can drop > 6 sends, and we will get 14 messages. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-25700) Fix flaky test TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage
[ https://issues.apache.org/jira/browse/IGNITE-25700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Timonin updated IGNITE-25700: Labels: IEP-132 ise (was: ) > Fix flaky test > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > --- > > Key: IGNITE-25700 > URL: https://issues.apache.org/jira/browse/IGNITE-25700 > Project: Ignite > Issue Type: Bug >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-132, ise > > TcpDiscoverySpiFailureTimeoutSelfTest#testConnectionCheckMessage > test is sometimes fail. The test checks that during failure detection timeout > there are 15-25 ping messages between two neighbors nodes. But sometimes > test fails because there are only 14 messages. > Reason is TcpDiscoveryMetricsUpdateMessage that is sent every 2000ms. Sending > of the message resets a timer used for sending the > TcpDiscoveryConnectionCheckMessage message. > Then in the default failure detection period (1ms) it can be sent 6 > times. It's OK to await 20 times of the check message, but the reset can drop > 6 sends, and we will get 14 messages. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
