Mike, also if what Joe asked with the backpressure is "not being applied",
if you're good with a profiler, I think joe and I both gravitated to
0x00000006c533b770 being locked in at
org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:757).
It would be interesting to see if that section is taking longer over time.

On Thu, Feb 16, 2017 at 11:56 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Mike
>
> One more thing...can you please grab a couple more thread dumps for us
> with 5 to 10 mins between?
>
> I don't see a deadlock but do suspect either just crazy slow IO going
> on or a possible livelock.  The thread dump will help narrow that down
> a bit.
>
> Can you run 'iostat -xmh 20' for a bit (or its equivalent) on the
> system too please.
>
> Thanks
> Joe
>
> On Thu, Feb 16, 2017 at 11:52 PM, Joe Witt <joe.w...@gmail.com> wrote:
> > Mike,
> >
> > No need for more info.  Heap/GC looks beautiful.
> >
> > The thread dump however, shows some problems.  The provenance
> > repository is locked up.  Numerous threads are sitting here
> >
> > at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(
> ReentrantReadWriteLock.java:727)
> > at org.apache.nifi.provenance.PersistentProvenanceRepository
> .persistRecord(PersistentProvenanceRepository.java:757)
> >
> > This means these are processors committing their sessions and updating
> > provenance but they're waiting on a readlock to provenance.  This lock
> > cannot be obtained because a provenance maintenance thread is
> > attempting to purge old events and cannot.
> >
> > I recall us having addressed this so am looking to see when that was
> > addressed.  If provenance is not critical for you right now you can
> > swap out the persistent implementation with the volatile provenance
> > repository.  In nifi.properties change this line
> >
> > nifi.provenance.repository.implementation=org.apache.nifi.provenance.
> PersistentProvenanceRepository
> >
> > to
> >
> > nifi.provenance.repository.implementation=org.apache.nifi.provenance.
> VolatileProvenanceRepository
> >
> > The behavior reminds me of this issue which was fixed in 1.x
> > https://issues.apache.org/jira/browse/NIFI-2395
> >
> > Need to dig into this more...
> >
> > Thanks
> > Joe
> >
> > On Thu, Feb 16, 2017 at 11:36 PM, Mikhail Sosonkin <mikh...@synack.com>
> wrote:
> >> Hi Joe,
> >>
> >> Thank you for your quick response. The system is currently in the
> deadlock
> >> state with 10 worker threads spinning. So, I'll gather the info you
> >> requested.
> >>
> >> - The available space on the partition is 223G free of 500G (same as was
> >> available for 0.6.1)
> >> - java.arg.3=-Xmx4096m in bootstrap.conf
> >> - thread dump and jstats are here
> >> https://gist.github.com/nologic/1ac064cb42cc16ca45d6ccd1239ce085
> >>
> >> Unfortunately, it's hard to predict when the decay starts and it takes
> too
> >> long to have to monitor the system manually. However, if you still need,
> >> after seeing the attached dumps, the thread dumps while it decays I can
> set
> >> up a timer script.
> >>
> >> Let me know if you need any more info.
> >>
> >> Thanks,
> >> Mike.
> >>
> >>
> >> On Thu, Feb 16, 2017 at 9:54 PM, Joe Witt <joe.w...@gmail.com> wrote:
> >>>
> >>> Mike,
> >>>
> >>> Can you capture a series of thread dumps as the gradual decay occurs
> >>> and signal at what point they were generated specifically calling out
> >>> the "now the system is doing nothing" point.  Can you check for space
> >>> available on the system during these times as well.  Also, please
> >>> advise on the behavior of the heap/garbage collection.  Often (not
> >>> always) a gradual decay in performance can suggest an issue with GC as
> >>> you know.  Can you run something like
> >>>
> >>> jstat -gcutil -h5 <pid> 1000
> >>>
> >>> And capture those rules in these chunks as well.
> >>>
> >>> This would give us a pretty good picture of the health of the system/
> >>> and JVM around these times.  It is probably too much for the mailing
> >>> list for the info so feel free to create a JIRA for this and put
> >>> attachments there or link to gists in github/etc.
> >>>
> >>> Pretty confident we can get to the bottom of what you're seeing
> quickly.
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Thu, Feb 16, 2017 at 9:43 PM, Mikhail Sosonkin <mikh...@synack.com>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > Recently, we've upgraded from 0.6.1 to 1.1.1 and at first everything
> was
> >>> > working well. However, a few hours later none of the processors were
> >>> > showing
> >>> > any activity. Then, I tried restarting nifi which caused some
> flowfiles
> >>> > to
> >>> > get corrupted evidenced by exceptions thrown in the nifi-app.log,
> >>> > however
> >>> > the processors still continue to produce no activity. Next, I stop
> the
> >>> > service and delete all state (content_repository database_repository
> >>> > flowfile_repository provenance_repository work). Then the processors
> >>> > start
> >>> > working for a few hours (maybe a day) until the deadlock occurs
> again.
> >>> >
> >>> > So, this cycle continues where I have to periodically reset the
> service
> >>> > and
> >>> > delete the state to get things moving. Obviously, that's not great.
> I'll
> >>> > note that the flow.xml file has been changed, as I added/removed
> >>> > processors,
> >>> > by the new version of nifi but 95% of the flow configuration is the
> same
> >>> > as
> >>> > before the upgrade. So, I'm wondering if there is a configuration
> >>> > setting
> >>> > that causes these deadlocks.
> >>> >
> >>> > What I've been able to observe is that the deadlock is "gradual" in
> that
> >>> > my
> >>> > flow usually takes about 4-5 threads to execute. The deadlock causes
> the
> >>> > worker threads to max out at the limit and I'm not even able to stop
> any
> >>> > processors or list queues. I also, have not seen this behavior in a
> >>> > fresh
> >>> > install of Nifi where the flow.xml would start out empty.
> >>> >
> >>> > Can you give me some advise on what to do about this? Would the
> problem
> >>> > be
> >>> > resolved if I manually rebuild the flow with the new version of Nifi
> >>> > (not
> >>> > looking forward to that)?
> >>> >
> >>> > Much appreciated.
> >>> >
> >>> > Mike.
> >>> >
> >>> > This email may contain material that is confidential for the sole
> use of
> >>> > the
> >>> > intended recipient(s).  Any review, reliance or distribution or
> >>> > disclosure
> >>> > by others without express permission is strictly prohibited.  If you
> are
> >>> > not
> >>> > the intended recipient, please contact the sender and delete all
> copies
> >>> > of
> >>> > this message.
> >>
> >>
> >>
> >> This email may contain material that is confidential for the sole use
> of the
> >> intended recipient(s).  Any review, reliance or distribution or
> disclosure
> >> by others without express permission is strictly prohibited.  If you
> are not
> >> the intended recipient, please contact the sender and delete all copies
> of
> >> this message.
>

Reply via email to