[ 
https://issues.apache.org/jira/browse/QPIDJMS-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy A. Bish resolved QPIDJMS-534.
-------------------------------------
    Fix Version/s: 0.59.0
       Resolution: Fixed

> BalancedProviderFuture.sync stuck forever during connection recovery
> --------------------------------------------------------------------
>
>                 Key: QPIDJMS-534
>                 URL: https://issues.apache.org/jira/browse/QPIDJMS-534
>             Project: Qpid JMS
>          Issue Type: Bug
>          Components: qpid-jms-client
>    Affects Versions: 0.42.0
>            Reporter: Ravi Nirmal
>            Assignee: Timothy A. Bish
>            Priority: Major
>             Fix For: 0.59.0
>
>         Attachments: logs.txt, thread-dump.txt
>
>
> Recently, we observed an issue on our production environment where we can see 
> that BalancedProviderFuture.sync method during connection recovery is stuck 
> forever and never returns. We have observed this in 2 hosts in last one week, 
> the only solution is to restart the server.
> I am attaching the thread dump which indicates the issue and how it blocks 
> other threads, [^thread-dump.txt] will have details of all the threads.
> h3. Details of Investigation
>  * This issue is happening on connection recovery during failover from one 
> server to another.
>  * By debugging I can see that BalancedProviderFuture.sync method is waiting 
> for its state to be updated, and its state is updated by AmqpProvider thread. 
> In thread dump I don't see any AmqpProvider thread which is in stuck state 
> which indicates that AmqpProvider has done its job but still the state for 
> given BalancedProviderFuture object is not updated.
>  * In the successful event, I can see that the state of 
> BalancedProviderFuture object is updated in below sequence:
>  ** JmsSession.onConnectionRecovery method calls provider.create after 
> creating BalancedProviderFuture object.
>  ** provider.create (aka AmqpProvider.create) is start a thread using 
> serializer, this create method has proper handling and it either calls 
> pumpToProtonTransport OR request.onFailure(which will update the state of 
> BalancedProviderFuture in case of exception).
>  ** Once the above thread gets finished(basically after 
> pumpToProtonTransport), the serializer will call the AmqpProvider.onData 
> method which will update the state of BalancedProviderFuture object.
>  * I have observed that if we get the exception in AmqpProvider.onData method 
> then the state of BalancedProviderFuture is not getting updated and the 
> BalancedProviderFuture.sync method gets stuck forever, the exception can come 
> in case of protonTransport tail is closed already(probably because of idle 
> timeout issue OR any other transport related issue).
>  * I have also observed that in some cases(of idle timeout OR transport 
> errors) after completion of a thread which was started by provider.create 
> (aka AmqpProvider.create), the serializer is not calling AmqpProvider.onData 
> but instead it calls AmqpProvider.onTransportError OR 
> AmqpProvider.onTransportClosed and I can not see any handling of updating the 
> state of BalancedProviderFuture object in onTransportError OR 
> onTransportClosed method.
>  * I am attaching some [^logs.txt] which shows some errors, these error came 
> when the state of BalancedProviderFuture is not updated and sync mehod stuck 
> forever.
>  * Please note we are using URL - failover:(amqp://localhost:5672
>  ,amqp://localhost:5682)?jms.sendTimeout=5000 and qpid version 0.42.0.
> I have found two old tickets QPIDJMS-458 & QPIDJMS-464 which shows the 
> similar issue, but I believe this issue is different and might needs to be 
> fixed separately.
> Can someone please take a look at this as this becomes critical issue in our 
> production environment and we don't have any option except restart of our 
> services?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to