What are you using for your partitionPersist to ES? Is it something you implemented yourself, or an open source library?
With Kafka —> Storm —> Elastic Search, ES is likely going to be your bottleneck, since indexing is comparatively expensive. So you will likely have to spend a fair amount of effort tuning ES and Storm/Trident. I have accomplished this with solid throughput and reliability, but it took a lot of work to get ES tuned. Chances are your ES cluster will have to be larger than your storm cluster. Any additional information you could add about your environment and use case would help. -Taylor On Nov 19, 2014, at 2:40 PM, Elliott Bradshaw <[email protected]> wrote: > I feel like there have to be people out there doing State updates with a > Trident-Kafka topology, has anyone successfully accomplished this with solid > throughput and reliability? > > On Tue, Nov 18, 2014 at 2:30 PM, Elliott Bradshaw <[email protected]> > wrote: > My apologies if I wasn't clear. > > PartitionPersist is a Trident stream operation that persists a batch of > Trident tuples to a stateful destination, in this case, Elasticsearch. > UpdateState is a function in the BaseStateUpdater class that should be called > when a batch of tuples arrives. > > On Tue, Nov 18, 2014 at 1:26 PM, Itai Frenkel <[email protected]> wrote: > Could you please elaborate what is the relation between "updateState" and > "partitionPersist"? Are those two consecutive topology bolts ? > > From: Elliott Bradshaw <[email protected]> > Sent: Tuesday, November 18, 2014 5:25 PM > To: [email protected] > Subject: Fwd: Issues with State updates in Kafka-Trident-Elasticsearch > topology > > > Hi All, > > I'm currently attempting to get a topology running for data into > Elasticsearch. Tuples go through some minimal marshalling and preprocessing > before being sent to partitionPersist, where they are transformed into JSON > and indexed in Elasticsearch. > > The cluster appears to work properly in local mode, but when deployed to my 4 > node cluster, state updates do not seem to fire correctly (sometimes they > don't fire at all). Tuple counter filters show data flowing through the > topology at a healthy rate (approx 80,000 rec/second), however, the > updateState function only rarely appears to be called. After a brief period > of time, no further calls to updateState are seen. > > As a test, I wrote a filter that queues up tuples and batch sends them to > Elasticsearch once a certain threshold is reached. This works perfectly fine > and is capable of managing the processing load. > > I've seen discussion of this behavior before, but have not managed to find an > explanation or solution. Has anybody else had similar issues or have a > solution? > > >
signature.asc
Description: Message signed with OpenPGP using GPGMail
