[ https://issues.apache.org/jira/browse/ARTEMIS-4794 ]


    Robbie Gemmell deleted comment on ARTEMIS-4794:
    -----------------------------------------

was (Author: jira-bot):
Commit 19d8059a4eca5b7296ec9711a439ea3a129242df in activemq-artemis's branch 
refs/heads/dependabot/maven/io.micrometer-micrometer-core-1.13.2 from Justin 
Bertram
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=19d8059a4e ]

ARTEMIS-4794 configure pending ack behavior for bridge

When a bridge is stopped it doesn't wait for pending send
acknowledgements to arrive. However, when a bridge is paused it does
wait. The behavior should be consistent and more importantly
configurable. This commit implements these improvements and generally
refactors BridgeImpl to clarify and simplify the code. In total, this
commit includes the follow changes:

 - Removes the hard-coded 60-second timeout for pending acks when
   pausing the bridge and adds a new config parameter (i.e.
   "pending-ack-timeout").
 - Applies the new pending-ack-timeout when the bridge is stopped.
 - Updates existing and adds new logging messages for clarity.
 - De-duplicates code for sending bridge-related notifications.
 - Avoids converting bridge name to/from SimpleString.
 - Removes unnecessary comments.
 - Renames variables & functions for clarity.
 - Replaces the `started`, `stopping`, & `active` booleans with a
   single `state` variable which is an enum.
 - Adds `final` to a few variables that were functionally final.
 - Synchronizes `stop` & `pause` methods to add safety when invoked
   concurrently with `handle` (since both deal with `state` and execute
   runnables on the ordered executor).
 - Reorganizes and removes a few methods for clarity.
 - Relocates `connect` method directly into `ConnectRunnable` (mirroring
   the structure of the `StopRunnable` and `PauseRunnable`).
 - Eliminates unnecessary variables in `ConnectRunnable` and
   `ScheduledConnectRunnable`.
 - Adds test to verify pending ack timeout works as expected with both
   `stop` & `pause` with both regular and large messages.


> CoreBridge: Duplicate message when bridge is stopped/Lost message when bridge 
> is paused while messages being produced to target node.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-4794
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4794
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.30.0, 2.34.0, 2.35.0
>            Reporter: nmeylan
>            Assignee: Justin Bertram
>            Priority: Major
>             Fix For: 2.36.0
>
>         Attachments: BridgeARTEMIS4794Test.java, 
> message-not-deliverable.log.txt
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> +Attached test *BridgeDuplicateMessagesARTEMIS4794Test.java*+ highlights the 
> issue with _org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl_
> Place it under 
> _tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/cluster/bridge_
> {*}Summary{*}:
>      When a bridge is stopped while messages being produced to the target 
> node, it can lead to duplicate messages.
> {*}Description{*}:
>     When Using bridge and programmatically *stopping* it while messages are 
> being produced to the target node, the source node fails to get the 
> acknowledgement from target node and messages now exists on the source and 
> the target node.
> It appears that the "active" flag being set to false when 
> BridgeImpl.StopRunnable is called prevent message to be acknowledged by 
> _BridgeImpl::sendAcknowledged_ function
>  
> {*}Context{*}:
> This bug appear in my code (a custom plugin) because is start and stop Bridge 
> programmatically to move messages from one node to another when some 
> conditions are met, if they are no longer met I want to stop the moving of 
> messages.
>  
> *Notes:*
>  * Changing bridge configuration 
> {_}useDuplicateDetection{_},{_}confirmationWindowSize{_} or 
> _producerWindowSize_ parameter do not help to mitigate the issue
>  * Not related to large messages, i use large messages in my test to ease 
> reproduction 
>  * Reproduced on 2.30 and 2.34
>  * Calling pause() does not create duplicate 
> {_}server.getClusterManager().getBridges().get(bridgeName).pause(){_};
>  
>  
>  
> *UPDATE:* When using pause instead of stop in above scenari, I get message 
> not being develirable anymore
> {*}Summary{*}:
> When a bridge is paused while *large* messages being produced to the target 
> node, it can lead to message not able to be delivered to new consumers.
> {*}Description{*}:
> When Using bridge and programmatically pausing it while messages are being 
> produced to the target node, If large messages are being delivered, the 
> thread In _BridgeImpl::deliverLargeMessage_ is not awaited, and the bridge is 
> paused then the Runnable of deliverLargeMessage is being run, leading to a 
> situation were the message won't be delivered to new consumers
> {*}Notes{*}:
>  * PauseRunnable does not await for task in {{executor}} to complete, 
> deliverLargeMessage do create task in executor
>  * 
>  ** We can see that even after PauseRunnable has complete, 
> deliverLargeMessage's task is running after.
>  * If I call {{bridge1.onCreditsFlow(true, null);}} to set the flag 
> {{blockedOnFlowControl}} to true, before calling pause, it prevent putting 
> new task on executor and mitigate the issue, but It feels weird and I think 
> there might still be race condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@activemq.apache.org
For additional commands, e-mail: issues-h...@activemq.apache.org
For further information, visit: https://activemq.apache.org/contact


Reply via email to