I replaced the Kafka processor with PulishKafka_0_10. It didn't start consuming from the stalled queue. I cleared all the queues again and it ran overnight without stalling, longer than it has before. I stopped and started the geoEnrichIp processor just now to see if it would stall and it did. I should be able to restart a processor like that right, and it should start consuming the queue again? As soon as I clear the stalled queue, whether or not it's full, it starts flowing again.
Thanks, Nick On Wed, Dec 21, 2016 at 11:34 AM, Bryan Bende <bbe...@gmail.com> wrote: > Thanks for the info. > > Since your Kafka broker is 0.10.1, I would be curious if you experience > the same behavior switching to PublishKafka_0_10. > > The Kafka processors line up like this... > > GetKafka/PutKafka use the 0.8.x Kafka client > ConsumeKafka/PublishKafka use the 0.9.x Kafka client > ConsumeKafka_0_10/PublishKafka_0_10 use the 0.10.x Kafka client > > In some cases it is possible to use a version of the client with a > different version of the broker, but it usually works best to use the > client that goes with the broker. > > I'm wondering if your PutKafka processor is getting stuck somehow, which > then causes back-pressure to build up all the way back to your TCP > processor, since it looked like all your queues were filled up. > > It is entirely possible that there is something else going on, but maybe > we can eliminate the Kafka processor from the list of possible problems by > testing with PublishKafka_0_10. > > -Bryan > > On Wed, Dec 21, 2016 at 2:25 PM, Nick Carenza < > nick.care...@thecontrolgroup.com> wrote: > >> Hey Brian, >> >> Thanks for taking the time! >> >> - This is nifi 1.1.0. I had the same troubles on 1.0.0 and upgraded >> recently with the hope there was a fix for the issue. >> - Kafka is version 2.11-0.10.1.0 >> - I am using the PutKafka processor. >> >> - Nick >> >> On Wed, Dec 21, 2016 at 11:19 AM, Bryan Bende <bbe...@gmail.com> wrote: >> >>> Hey Nick, >>> >>> Sorry to hear about these troubles. A couple of questions... >>> >>> - What version of NiFi is this? >>> - What version of Kafka are you using? >>> - Which Kafka processor in NiFi are you using? It looks like PutKafka, >>> but just confirming. >>> >>> Thanks, >>> >>> Bryan >>> >>> >>> On Wed, Dec 21, 2016 at 2:00 PM, Nick Carenza < >>> nick.care...@thecontrolgroup.com> wrote: >>> >>>> I am running into an issue where a processor will stop receiving flow >>>> files from it's queue. >>>> >>>> flow: tcp --(100,000)--> evaljsonpath --(100,000)--> geoip >>>> --(100,000)--> putkafka >>>> >>>> This time, putkafka is the processor that has stopped receiving >>>> flowfiles​ >>>> >>>> I will try to list the queue and I'll get a message that says the queue >>>> has no flow files in it. I checked the http request and the response says >>>> there are 100,000 flow files in the queue but the flowFileSummaries array >>>> is empty. >>>> >>>> GET /nifi-api/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35 >>>>> c2/listing-requests/22754339-0159-1000-2dc9-07db09366132 HTTP/1.1 >>>>> { >>>>> "listingRequest": { >>>>> "id": "22754339-0159-1000-2dc9-07db09366132", >>>>> "uri": "http://ipaddress:8080/nifi-ap >>>>> i/flowfile-queues/1d72b81f-0159-1000-d09b-dc33e81b35c2/listi >>>>> ng-requests/22754339-0159-1000-2dc9-07db09366132", >>>>> "submissionTime": "12/21/2016 17:37:07.385 UTC", >>>>> "lastUpdated": "17:37:07 UTC", >>>>> "percentCompleted": 100, >>>>> "finished": true, >>>>> "maxResults": 100, >>>>> "state": "Completed successfully", >>>>> "queueSize": { >>>>> "byteCount": 288609476, >>>>> "objectCount": 100000 >>>>> }, >>>>> "flowFileSummaries": [], >>>>> "sourceRunning": true, >>>>> "destinationRunning": true >>>>> } >>>>> } >>>> >>>> >>>> I tried stopping and starting all the processors, replacing the >>>> putkafka with a new duplicate putkafka processor and moving the queue over >>>> to it, restarting kafka itself. I ran a dump with all the processors >>>> "running". >>>> >>>> Since this is not running in a production environment, as a last resort >>>> I cleared the queue and then everything started flowing again. >>>> >>>> I have experienced this issue many times since I have begun evaluating >>>> Nifi. I have heard others having great success with it so I am convinced I >>>> have misconfigured something. I have tried to provide any relevant >>>> configuration information here: >>>> >>>> # nifi.properties >>>> nifi.version=1.1.0 >>>> nifi.flowcontroller.autoResumeState=true >>>> nifi.flowcontroller.graceful.shutdown.period=10 sec >>>> nifi.flowservice.writedelay.interval=500 ms >>>> nifi.administrative.yield.duration=30 sec >>>> nifi.bored.yield.duration=10 millis >>>> nifi.state.management.provider.local=local-provider >>>> nifi.swap.manager.implementation=org.apache.nifi.controller. >>>> FileSystemSwapManager >>>> nifi.queue.swap.threshold=1000 >>>> nifi.swap.in.period=5 sec >>>> nifi.swap.in.threads=1 >>>> nifi.swap.out.period=5 sec >>>> nifi.swap.out.threads=4 >>>> nifi.cluster.is.node=false >>>> nifi.build.tag=nifi-1.1.0-RC2 >>>> nifi.build.branch=NIFI-3100-rc2 >>>> nifi.build.revision=f61e42c >>>> nifi.build.timestamp=2016-11-26T04:39:37Z >>>> >>>> # JVM memory settings >>>> java.arg.2=-Xms28g >>>> java.arg.3=-Xmx28g >>>> java.arg.13=-XX:+UseG1GC >>>> >>>> controller settings: >>>> timer driven thread count: 10-20 (i have tried values from 10 to 20 and >>>> still experience the issue) >>>> event drive thread count: 5 (haven't touched) >>>> >>>> processors: >>>> concurrency: 1-20 (i have tried values from 1 to 20 and still >>>> experience the issue) >>>> scheduling: timer driven (run-schedule: 0 run-duration: 0) >>>> >>>> queues: >>>> backpressure flowfile count: 100,000 >>>> backpressure flowfile size: 1G >>>> >>>> machine: >>>> 128g ram >>>> 20 cpu >>>> disk: 3T >>>> >>>> --- >>>> >>>> Really I have 2 questions: >>>> >>>> 1. Why is this happening? >>>> 2. Once the flow is in this state, how can I get it flowing again >>>> without losing flowfiles? >>>> >>>> Thanks, >>>> Nick >>>> >>>> >>>> >>> >> >