Do you have any kernel (net/disk) tuning recommendations for qpid-cpp with
linearstore?

Ram

On Thu, Nov 8, 2018 at 8:56 AM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Kim/Gordon,
>
> I was wrong about the NFS for qpid journal files, looks like they are on
> NFS, so does NFS cause this issue?
>
> Ram
>
> On Wed, Nov 7, 2018 at 12:18 PM rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
>> Kim,
>>
>> Ok, i am still trying to see what part of my java application is causing
>> that issue, yes that issue is happening intermittently. Regarding
>> "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
>> previous error JERR_JCNTL_AIOCMPLWAIT?
>>
>> Does message size contribute to this issue?
>>
>> Thanks,
>> Ram
>>
>> On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kvand...@redhat.com>
>> wrote:
>>
>>> No, they are not.
>>>
>>> These two defines govern the number of sleeps and the sleep time while
>>> waiting for before throwing an exception during recovery only. They do
>>> not play a role during normal operation.
>>>
>>> If you are able to compile the broker code, you can try playing with
>>> these values. But I don't think they will make much difference to the
>>> overall problem. I think some of the other errors you have been seeing
>>> prior to this one are closer to where the real problem lies - such as
>>> the JRNL_WMGR_ENQDISCONT error.
>>>
>>> Do you have a reproducer of any kind? Does this error occur predictably
>>> under some or other conditions?
>>>
>>> Thanks,
>>>
>>> Kim van der Riet
>>>
>>> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
>>> > Kim,
>>> >
>>> > I see these two settings from code, can these be configurable?
>>> >
>>> > #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>> >
>>> > #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>> >
>>> >
>>> > Ram
>>> >
>>> > On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
>>> > rammohanga...@gmail.com> wrote:
>>> >
>>> >> Thank you Kim, i will try your suggestions.
>>> >>
>>> >> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvand...@redhat.com
>>> wrote:
>>> >>
>>> >>> This error is a linearstore issue. It looks as though there is a
>>> single
>>> >>> write operation to disk that has become stuck, and is holding up all
>>> >>> further write operations. This happens because there is a fixed
>>> circular
>>> >>> pool of memory pages used for the AIO operations to disk, and when
>>> one
>>> >>> of these is "busy" (indicated by the A letter in the  page state
>>> map),
>>> >>> write operations cannot continue until it is cleared. It it does not
>>> >>> clear within a certain time, then an exception is thrown, which
>>> usually
>>> >>> results in the broker closing the connection.
>>> >>>
>>> >>> The events leading up to a "stuck" write operation are complex and
>>> >>> sometimes difficult to reproduce. If you have a reproducer, then I
>>> would
>>> >>> be interested to see it! Even so, the ability to reproduce on another
>>> >>> machine is hard as it depends on such things as disk write speed, the
>>> >>> disk controller characteristics, the number of threads in the thread
>>> >>> pool (ie CPU type), memory and other hardware-related things.
>>> >>>
>>> >>> There are two linearstore parameters that you can try playing with to
>>> >>> see if you can change the behavior of the store:
>>> >>>
>>> >>> wcache-page-size: This sets the size of each page in the write
>>> buffer.
>>> >>> Larger page size is good for large messages, a smaller size will
>>> help if
>>> >>> you have small messages.
>>> >>>
>>> >>> wchache-num-pages: The total number of pages in the write buffer.
>>> >>>
>>> >>> Use the --help on the broker with the linearstore loaded to see more
>>> >>> details on this. I hope that helps a little.
>>> >>>
>>> >>> Kim van der Riet
>>> >>>
>>> >>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>>> >>>> Any help in understand why/when broker throws those errors and stop
>>> >>>> receiving message would be appreciated.
>>> >>>>
>>> >>>> Not sure if any kernel tuning or broker tuning needs to be done to
>>> >>>> solve this issue.
>>> >>>>
>>> >>>> Thanks in advance,
>>> >>>> Ram
>>> >>>>
>>> >>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>>> >>>> rammohanga...@gmail.com> wrote:
>>> >>>>
>>> >>>>> Also from this log message (store level) it seems like waiting for
>>> AIO
>>> >>> to
>>> >>>>> complete.
>>> >>>>>
>>> >>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>>> "<journal
>>> >>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>>> >>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>>> >>>>> ps=[-------------------------A------]
>>> >>>>>
>>> >>>>> page_state ps=[-------------------------A------]  where A is
>>> >>> AIO_PENDING
>>> >>>>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>>> >>>>>
>>> >>>>> When there is or there are pending AIO, does broker close the
>>> >>> connection?
>>> >>>>> is there any tuning that can be done to resolve this?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Ram
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>>> >>>>> rammohanga...@gmail.com> wrote:
>>> >>>>>
>>> >>>>>> I was check the code and i see these lines for that AIO timeout.
>>> >>>>>>
>>> >>>>>>                 case
>>> >>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>> >>>>>>                   if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>> >>>>>>                       THROW_STORE_EXCEPTION("Timeout waiting for
>>> AIO in
>>> >>>>>> MessageStoreImpl::recoverMessages()");
>>> >>>>>>                   ::usleep(AIO_SLEEP_TIME_US);
>>> >>>>>>                   break;
>>> >>>>>>
>>> >>>>>> And these are the defaults
>>> >>>>>>
>>> >>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>> >>>>>>
>>> >>>>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>     RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next
>>> page is
>>> >>>>>> waiting for AIO.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> So does page got blocked and its waiting for page availability?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Ram
>>> >>>>>>
>>> >>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>>> >>>>>> rammohanga...@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after
>>> that
>>> >>> we
>>> >>>>>>> see this message
>>> >>>>>>>
>>> >>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>> >>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>>> >>> offs=0x107680
>>> >>>>>>> (likely journal overwrite boundary); 19 filler record(s)
>>> required.
>>> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>> >>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
>>> >>> fid=0x4605b
>>> >>>>>>> offs=0x107680
>>> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>> >>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover
>>> phase
>>> >>> logs
>>> >>>>>>> It worked fine for a day and started throwing this message:
>>> >>>>>>>
>>> >>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>>> "<name>":
>>> >>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>>> >>> pi=25 pc=8
>>> >>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>> >>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot
>>> deliver
>>> >>> to
>>> >>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>>> >>> failed:
>>> >>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>>> >>> JERR_JCNTL_AIOCMPLWAIT:
>>> >>>>>>> Timeout waiting for AIOs to complete.
>>> >>>>>>>
>>> >>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> >>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>>> >>> framing-error:
>>> >>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>>> >>> 0x0202
>>> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>> >>> waiting for
>>> >>>>>>> AIOs to complete.
>>> >>>>>>>
>>> >>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>>> >>> <queue-name>:
>>> >>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
>>> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>> >>> waiting for
>>> >>>>>>> AIOs to complete.
>>> >>>>>>>
>>> >>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>>> >>> illegal-argument:
>>> >>>>>>> Value for replyText is too large(320)
>>> >>>>>>>
>>> >>>>>>> Thanks,
>>> >>>>>>> Ram
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>> >>>>>>> rammohanga...@gmail.com> wrote:
>>> >>>>>>>
>>> >>>>>>>> No, local disk.
>>> >>>>>>>>
>>> >>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <g...@redhat.com>
>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>> >>>>>>>>>> Gordon,
>>> >>>>>>>>>>
>>> >>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35
>>> version
>>> >>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>>> >>> scenario
>>> >>>>>>>>> its
>>> >>>>>>>>>> happening but after i restart broker and if we wait for few
>>> days
>>> >>> its
>>> >>>>>>>>>> happening again. From the above logs do you have any pointers
>>> to
>>> >>>>>>>>> check?
>>> >>>>>>>>>
>>> >>>>>>>>> Are you using NFS?
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>> ---------------------------------------------------------------------
>>> >>>>>>>>> To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
>>> >>>>>>>>> For additional commands, e-mail: users-h...@qpid.apache.org
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
>>> >>> For additional commands, e-mail: users-h...@qpid.apache.org
>>> >>>
>>> >>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
>>> For additional commands, e-mail: users-h...@qpid.apache.org
>>>
>>>

Reply via email to