Not being such an expert to linearstore as Kim, I have two ideas:

1) in case you have thousands of durable queues, you can hit kernel's limit
on AIO operations and need to increase fs.aio-max-nr parameter. For
calculation: I recall on some systems (rhel6?) one durable queue required
33 AIO handlers, on rhel7 it seems less (half?), but take this as a rule of
thumb only.

2) It seems the journal file handler has not been initialized as it is null
pointer. That could be consequence of the improper shutdown (though a buggy
one). If you don't care about the data in the queue, you can replace the
jrnl file(s) by empty one (I can share the file). But I expect you would
like to get the data - then I would start with enabling trace logs via
adding

log-enable=trace+:linearstore
log-to-file=/path/to/file.log  # if not already logging somewhere, i.e.
syslog (with trace logs not dropped)

and observing how journal recovery happened on all jrnl files (or symlinks
to them) under

/var/lib/qpidd/.qpidd/qls/jrnl2/440d04db-7fb6-3424-a83c-b70014fa32a0

directory (here I deduce the uuid is a real queue name, per your error
logs).

I expect one jrnl file (the most current) recovery would fail in some
manner.


Kind regards,
Pavel


On Thu, May 30, 2019 at 11:57 PM Justin Ross <justin.r...@gmail.com> wrote:

> Kim?
>
> On Tue, May 14, 2019, 14:01 Gordon Sim <g...@redhat.com> wrote:
>
> > On 14/05/2019 10:46 am, Pål Skjager Løberg wrote:
> > > For a client, just getting "illegal-argument: Value for replyText is
> too
> > > large" back as an error when sending is not the most useful info and I
> > > suspect, especially after reading the mentioned thread from November,
> > there
> > > might be a bug in how the error responses to the client is generated
> > > causing the actual error to be masked by another error.
> > >
> > > Also, there seems to be a possibility that the Qpid broker will start
> wth
> > > broken queues, causing it to fail only when messages are written to
> that
> > > queue, including some null pointer problems.
> > >
> > > Are any of these known issues or is it the expected behavior?
> >
> > No, neither of these is the correct behaviour.
> >
> > I have committed a fix for the first issue:
> > https://issues.apache.org/jira/browse/QPID-8313
> >
> > For the issue with the journal recovery, I'd need to defer to the
> > expert. Kim, can you recommend any diagnostics to figure out what would
> > cause the problems in the queues on recovery? i.e. errors such as:
> >
> > > jexception 0x010b LinearFileController::getCurrentSerial() threw
> > > JERR__NULL: Operation on null pointer
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> > For additional commands, e-mail: users-h...@qpid.apache.org
> >
> >
>

Reply via email to