thanks guys here is new Jira as requested https://issues.apache.org/jira/browse/NIFI-5853
On Thu, Nov 29, 2018 at 2:06 PM Otto Fowler <ottobackwa...@gmail.com> wrote: > Maybe you can open a jira for a ZK client like brian mentions? > > > On November 29, 2018 at 13:59:36, Boris Tyukin (bo...@boristyukin.com) > wrote: > > thanks, already looked at state manager but unfortunately need to share > some values between processors in my case. > > I am also researching another which is to use our internal MySQL database. > I was thinking to create an indexed table and a few simple groovy > processors around it to put/get/remove values. That database is already set > up for online replication to another MySQL instance and we can set it up > for HA easily. I know it sounds like more work than just using NiFi > distributed cache and I am not sure if MySQL will handle 1000 requests per > second (even though they will be against a tiny table). But HA setup would > be nice for us and since "distributed" cache is not really distributed, I > am not sure I like it. > > ZK is an option as well I think since we already have it (for NiFi, Kafka > and HDFS). Looks like I can create some simple groovy processors to use ZK > API. I do not expect a lot of put/get operations - maybe about 1000 per > second max and based on benchmarks I've seen ZK should be able to handle > this. > > I've looked at Redis as well and it is awesome but we are not excited to > add another system to maintain - we already have quite a few to keep our > admins busy :) > > At least I have choices... :) > > Thanks again for your help! > > On Thu, Nov 29, 2018 at 1:33 PM Bryan Bende <bbe...@gmail.com> wrote: > >> I also meant to add that NiFi does provide a "state manager" API to >> processors, which when clustered will use ZooKeeper. >> >> The difference between this and DMC, is that the state for a processor >> is only accessible to the given processor (or all the instances of the >> processor across the cluster). It is stored by the processor's UUID. >> >> So if the state doesn't need to be shared across different parts of >> the flow, then you can use this instead. You can look at >> ProcesContext.getStateManager() >> >> On Thu, Nov 29, 2018 at 1:08 PM Boris Tyukin <bo...@boristyukin.com> >> wrote: >> > >> > thanks for the explanation, Bryan! it helps! >> > >> > Boris >> > >> > On Thu, Nov 29, 2018 at 12:26 PM Bryan Bende <bbe...@gmail.com> wrote: >> >> >> >> Boris, >> >> >> >> Yes the "distributed" name is confusing... it is referring to the fact >> >> that it is a cache that can be accessed across the cluster, rather >> >> than a local cache on each node, but you are correct that that DMC >> >> server is a single point of failure. >> >> >> >> It is important to separate the DMC client and server, there are >> >> multiple implementations of the DMC client that can interact with >> >> different caches (Redis, HBase, etc), the trade-off being you then >> >> have to run/maintain these external systems, instead of the DMC server >> >> which is fully managed by NiFi. >> >> >> >> Regarding ZK... I don't think there is a good answer other than the >> >> fact that DMC existed when NiFi was open sourced, and NiFi didn't >> >> start using ZK for clustering until the 1.0.0 release, so originally >> >> ZK wasn't in the picture. I assume we could implement a DMC client >> >> that talked to ZK, just like we have done for Redis, HBase, and >> >> others. >> >> >> >> I'm not aware of any issues with the DMC server persisting to file >> >> system or handling concurrent connections, it should be stable. >> >> >> >> Thanks, >> >> >> >> Bryan >> >> >> >> On Thu, Nov 29, 2018 at 11:52 AM Boris Tyukin <bo...@boristyukin.com> >> wrote: >> >> > >> >> > Hi guys, >> >> > >> >> > I have a few questions about DistributedMapCacheServer. >> >> > >> >> > First question, I am confused by "Distributed" part. If I get it, >> the server actually runs on a single node and if it fails, it is game over. >> Is that right? Why NiFi is not using ZK for that since ZK is already used >> by NiFi cluster? I see most of the use cases / examples are about using >> DistributedMapCacheServer as a lookup or state store and this is exactly >> what ZK was designed for and provides redundancy, scalability and 5-10k ops >> per sec on 3 node ZK cluster. >> >> > >> >> > Second question, I did not find any tools to interact with it other >> than Matt's groovy tool. >> >> > >> >> > Third question, how DistributedMapCacheServer that persists to file >> system, handles concurrency and locking? Is it reliable and can be trusted? >> >> > >> >> > And lastly, is there additional overhead to support >> DistributedMapCacheServer as another system or it is pretty much hands off >> once a controller is set up? >> >> > >> >> > Thanks! >> >> > Boris >> >