Re: Flink ignoring -yn

2018-07-24 Thread Garrett Barton
I missed your reply, I'm using 1.5.0. Will be upgrading to 1.5.1 soon. On Tue, Jul 10, 2018, 10:15 PM Jeff Zhang wrote: > Which flink version do you use ? > > > Garrett Barton 于2018年7月11日周三 上午1:09写道: > > > Hey all, > > I am running flink in batch mode on yarn wit

Flink ignoring -yn

2018-07-10 Thread Garrett Barton
Hey all, I am running flink in batch mode on yarn with independant jobs creating their own clusters. I have a flow defined that scales parallelism based on input size (to keep overall processing time somewhat constant). Right now the flow initializes with around ~22k tasks for a flow that sets

Flink on Yarn -Connection unexpectedly close dby remote task manager..

2018-06-15 Thread Garrett Barton
Hey all, My jobs that I am trying to write in Flink 1.5 are failing after a few minutes. I recon its because the idle task managers are shutting down, but it seems to kill the client and the running job, which was still going on one of the other task managers. either way I get:

Re: performance test using real data - comparing throughput & latency

2017-09-15 Thread Garrett Barton
When building these kinds of tests I always just orchestrated my producers and consumers to spit metrics out somewhere easy to collect. Never looked for a ui/tool to do it before. Assuming good NTP configs (sub ms accuracy), I would typically put timing data into the key portion of the messages

Re: Streams: Fetch Offset 0 is out of range for partition foo-0, resetting offset

2017-08-14 Thread Garrett Barton
ol) with what offset > ranges. > From the code path the broker should at least maintain one empty segment > even if all data gets truncated, in trunk, but I'm not sure if you are > running on an older version that may have some bug on the broker logs. > > Guozhang > > > > >

Re: Streams: Fetch Offset 0 is out of range for partition foo-0, resetting offset

2017-08-14 Thread Garrett Barton
position to 0, did you run > some tools periodically to reset the offset to 0? > > > Guozhang > > > > > > On Wed, Aug 9, 2017 at 7:16 AM, Garrett Barton <garrett.bar...@gmail.com> > wrote: > > > I have a small test setup with a local zk/kafka server and a streams

Streams: Fetch Offset 0 is out of range for partition foo-0, resetting offset

2017-08-09 Thread Garrett Barton
I have a small test setup with a local zk/kafka server and a streams app that loads sample data. The test setup is usually up for a day or two before a new build goes out and its blown away and loaded from scratch. Lately I've seen that after a few hours the stream app will stop processing and

Re: Kafka Streams: why aren't offsets being committed?

2017-08-07 Thread Garrett Barton
; > This one > https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-5510 > > Best, > Dmitry > > пн, 7 авг. 2017 г. в 14:22, Garrett Barton <garrett.bar...@gmail.com>: > > > Dmitry, which KIP are you referring to? I see this behavior too > sometim

Re: Kafka Streams: why aren't offsets being committed?

2017-08-07 Thread Garrett Barton
Dmitry, which KIP are you referring to? I see this behavior too sometimes. On Fri, Aug 4, 2017 at 10:25 AM, Dmitry Minkovsky wrote: > Thank you Matthias and Bill, > > Just want to confirm that was my offsets *were *being committed but I was > being affected by

Re: Monitor all stream consumers for lag

2017-08-01 Thread Garrett Barton
consumer groups and doesn't commit > offsets. The offsets are checkpointed to local disk, so they won't show up > with the ConsumerGroupCommand. > > That said it would be useful to see the lag, so maybe raise a JIRA for it? > > Thanks, > Damian > > On Tue, 1 Aug 2017 at 15:0

Re: Event sourcing with Kafka and Kafka Streams. How to deal with atomicity

2017-07-21 Thread Garrett Barton
Could you take in both topics via the same stream? Meaning don't do a kafka streams join, literally just read both streams. If KStream cant do this, dunno haven't tried, then simple upstream merge job to throw them into 1 topic with same partitioning scheme. I'd assume you would have the products

Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Garrett Barton
Getting good concurrency in a webapp is more than doable. Check out these benchmarks: https://www.techempower.com/benchmarks/#section=data-r14=ph=db I linked to the single query one because thats closest to a single operation like you will be doing. I'd also note if the data delivery does not

Re: Dropped messages in kstreams?

2017-06-15 Thread Garrett Barton
Is your time usage correct? It sounds like you want event time not load/process time which is default unless you have a TimestampExtractor defined somewhere upstream? Otherwise I could see far fewer events coming out as streams is just aggregating whatever showed up in that 10 second window. On

Re: Debugging Kafka Streams Windowing

2017-06-07 Thread Garrett Barton
Mahendra, Did increasing those two properties do the trick? I am running into this exact issue testing streams out on a single Kafka instance. Yet I can manually start a consumer and read the topics fine while its busy doing this dead stuffs. On Tue, May 23, 2017 at 12:30 AM, Mahendra Kariya

Re: How to chain increasing window operations one after another

2017-05-12 Thread Garrett Barton
t; data 1-to-1, but additionally registers a punctuation schedule. When >>> punctuation is called, it would be required to send tombstone messages >>> downstream (or a simliar) that deletes windows that are older than the >>> retention time. Sound tricky to implement though... `tra

Re: How to chain increasing window operations one after another

2017-05-08 Thread Garrett Barton
as you said, the downstream is aware of the > >> statefulness of the upstream and correctly treats each record as > >> an update) > >> * If you want to reduce message volume further, you can break these > >> into separate KafkaStreams instances

Re: Verify time semantics through topology

2017-05-05 Thread Garrett Barton
ay. > > Does this answer all your questions? > > (We don't document those details on purpose, because it's an internal > design and we want the flexibility to change this if required -- thus, > you should also not rely on "stream time" advance assumptions in your > code.) > >

Re: Verify time semantics through topology

2017-05-05 Thread Garrett Barton
mit <k,(cnt,sum)> > records (ie, a custom data data for value) and in mapValue() you compute > <k,avg>. > > Hope this helps. > > -Matthias > > On 5/4/17 7:36 PM, Garrett Barton wrote: > > I think I have an understanding of how Kafka Streams is handling time

Verify time semantics through topology

2017-05-04 Thread Garrett Barton
I think I have an understanding of how Kafka Streams is handling time behind the scenes and would like someone to verify it for me. The actual reason is I am running into behavior where I only can join two streams for a little, then it stops working. Assuming a topology like this: FEED ->

Re: Windowed aggregations memory requirements

2017-05-03 Thread Garrett Barton
That depends on if your using event, processing or ingestion time. My understanding is that if you play a record through that is T-6, the only way that TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5)) would actually process that record in your window is if your

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-03 Thread Garrett Barton
're talking about Rocksdb > >> On Tue, May 2, 2017 at 10:08 AM Damian Guy <damian@gmail.com> > wrote: > >> > >>> Hi Garret, > >>> > >>> No, log.retention.hours doesn't impact compacted topics. > >>> > >>> Thanks, >

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Garrett Barton
Thanks Damian, Does setting log.retention.hours have anything to do with compacted topics? Meaning would a topic not compact now for 90 days? I am thinking all the internal topics that streams creates in the flow. Having recovery through 90 days of logs would take a good while I'd imagine.

How to chain increasing window operations one after another

2017-05-02 Thread Garrett Barton
Lets say I want to sum values over increasing window sizes of 1,5,15,60 minutes. Right now I have them running in parallel, meaning if I am producing 1k/sec records I am consuming 4k/sec to feed each calculation. In reality I am calculating far more than sum, and in this pattern I'm looking at

Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread Garrett Barton
Greetings all, I have a use case where I want to calculate some metrics against sensor data using event time semantics (record time is event time) that I already have. I have years of it, but for this POC I'd like to just load the last few months so that we can start deriving trend lines now vs

Re: Debugging Kafka Streams Windowing

2017-05-02 Thread Garrett Barton
Mahendra, One possible thing I have seen that exhibits the same behavior of missing windows of data is the configuration of the topics (internal and your own) retention policies. I was loading data that was fairly old (weeks) and using event time semantics as the record timestamp (custom

high level consumer usage question with auto topic creation

2013-07-30 Thread Garrett Barton
I am trying to write a junit test for working with an embedded Kafka 0.8 server where I send and receive messages just to verify my embedded server, producer and consumer wrappers work. Order of operation in the junit looks something like: -Start zk. [own thread] (wait for init) -Start Kafka