Re: [Help] About ConcurrentModificationException

2017-12-11 Thread Matthias J. Sax
The ticket is still open, thus, it's not fixed. If a ticket is resolved, you can check the "Fix Version" field to see for which Kafka version it got fixed. -Matthias On 12/10/17 10:46 PM, wrote: > Hello ! > We?0?2found?0?2this?0?2error?0?2when?0?2we?0?2read?0?2Kafka?0?2data?0?2using?0?2

Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-09 Thread Matthias J. Sax
ecords.per.partition > > On Sat, Dec 9, 2017 at 2:48 PM, Dmitry Minkovsky > wrote: > >> Hi Matthias, yes that definitely helps. A few thoughts inline below. >> >> Thank you! >> >> On Fri, Dec 8, 2017 at 4:21 PM, Matthias J. Sax >> wrote: >> >&

Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-08 Thread Matthias J. Sax
Hard to give a generic answer. 1. We recommend to over-partitions your input topics to start with (to avoid that you need to add new partitions later on); problem avoidance is the best strategy. There will be some overhead for this obviously on the broker side, but it's not too big. 2. Not sure w

Re: Configuration: Retention and compaction

2017-12-08 Thread Matthias J. Sax
ondering about my second question though: does deletion/compaction > affect the currently opened log segment? Seems like it cannot. > > > > > On Mon, Dec 4, 2017 at 2:54 PM, Matthias J. Sax > wrote: > >> Topic can be configured in "dual" mode too

Re: Kafka Streams app error while rebalancing

2017-12-06 Thread Matthias J. Sax
. Please find my response inline. > > On Wed, Dec 6, 2017 at 12:34 AM, Matthias J. Sax > wrote: > >> Hard to say. >> >> However, deleting state directories will not have any negative impact as >> you don't use stores. Thus, why do you not want to do this?

Re: Kafka Streams app error while rebalancing

2017-12-05 Thread Matthias J. Sax
Hard to say. However, deleting state directories will not have any negative impact as you don't use stores. Thus, why do you not want to do this? Another workaround you can do, it to start four applications with 1 thread each -- this would isolate the instances further and avoid the lock issue (y

Re: Configuration: Retention and compaction

2017-12-04 Thread Matthias J. Sax
Topic can be configured in "dual" mode too via >> cleanup.policy="delete,compact" For this case, `retention.ms` is basically a TTL for a key that is not updated for this amount of time. -Matthias On 12/3/17 11:54 AM, Jan Filipiak wrote: > Hi > > the only retention time that applies for comp

Re: Plans to extend streams?

2017-11-29 Thread Matthias J. Sax
We had some discussion if we can/should replace re-partitioning topic via a direct network connection between instances. It's a tricky problem though with many string attached... Thus, it comes with pros and cons and it's still unclear what the exact trade-off is. Thus, it might happen, but it's u

Re: Upgrading producers to 1.0

2017-11-28 Thread Matthias J. Sax
Upgrading brokers without client was always supported :) Since 0.10.2, it also works the other way round. -Matthias On 11/28/17 7:33 AM, Brian Cottingham wrote: > On 11/27/17, 8:36 PM, "Matthias J. Sax" wrote: > > Not sure were you exactly copied this. However, se

Re: Upgrading producers to 1.0

2017-11-27 Thread Matthias J. Sax
Ah. I see. You copied from the KIP. The "Motivation" sections describes the state _before_ the change :) -Matthias On 11/27/17 5:36 PM, Matthias J. Sax wrote: > Not sure were you exactly copied this. However, second paragraph here > https://kafka.apache.org/documentation

Re: Upgrading producers to 1.0

2017-11-27 Thread Matthias J. Sax
n Cottingham wrote: > On 11/27/17, 5:53 PM, "Matthias J. Sax" wrote: > > Since 0.10.2, you can upgrade your clients without upgrading your brokers. > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-97%3A+Improved+Kafka+Client+RPC+Compatibility+Polic

Re: Upgrading producers to 1.0

2017-11-27 Thread Matthias J. Sax
Since 0.10.2, you can upgrade your clients without upgrading your brokers. https://cwiki.apache.org/confluence/display/KAFKA/KIP-97%3A+Improved+Kafka+Client+RPC+Compatibility+Policy -Matthias On 11/27/17 11:44 AM, Brian Cottingham wrote: > When upgrading from 0.11 to 1.0, do I have to upgrade t

Re: Time-Based Index for Consuming Messages Up to Certain Timestamp

2017-11-24 Thread Matthias J. Sax
h is > why I initially was looking at the time-based index functionality. Based on > what you’ve said so far, I take it using the time-based index to find an > offset corresponding to a timestamp and then consuming all messages with a > smaller offset is not a viable solution? > &g

Re: parallel processing of records in a Kafka consumer

2017-11-24 Thread Matthias J. Sax
#x27;m genuinely interested in understanding if I correctly get > how Kafka consumers works. > Comments and suggestions are welcome :) > > Best regards, > Vincenzo > > On Wed, Nov 22, 2017 at 11:15 PM, Matthias J. Sax > wrote: > >> I KafkaConsumer itself shou

Re: parallel processing of records in a Kafka consumer

2017-11-23 Thread Matthias J. Sax
ork if you need to scale over multiple machines. -Matthias On 11/23/17 11:02 AM, cours.syst...@gmail.com wrote: > > > On 2017-11-22 23:15, "Matthias J. Sax" wrote: >> I KafkaConsumer itself should be use single threaded. If you want to >> parallelize pro

Re: auto.offset.reset in 0.11.0.2 Kafka Streams does not take effect.

2017-11-23 Thread Matthias J. Sax
You might want to consider using the reset tool instead of just changing the application.id... https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/ -Matthias On 11/23/17 3:34 AM, Artur Mrozowski wrote: > Oh, I've got it. Need to reset the applicati

Re: Specfic offset processing in Kafka Streams

2017-11-22 Thread Matthias J. Sax
I don't think that a "from-to" pattern would a common scenario -- Kafka is about stream processing, not batch processing. I guess you can to a hand crafted solution though. 1) use bin/kafka-consumer-groups.sh to seek to the corresponding start offset for the group.id/application.id of your Stream

Re: parallel processing of records in a Kafka consumer

2017-11-22 Thread Matthias J. Sax
I KafkaConsumer itself should be use single threaded. If you want to parallelize processing, each thread should have it's own KafkaConsumer instance and all consumers should use the same `group.id` in their configuration. Load will be shared over all running consumer automatically for this case.

Re: Time-Based Index for Consuming Messages Up to Certain Timestamp

2017-11-21 Thread Matthias J. Sax
of > RAM. > > Ray > > On 2017-11-21, 5:57 PM, "Matthias J. Sax" wrote: > > This is possible, but I think you don't need the time-based index for it > :) > > You will just buffer up all messages for a 5 minute sliding-window and >

Re: Streams and Windows

2017-11-21 Thread Matthias J. Sax
You need to disable KTable cache to get every update by setting caches size to zero: https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html -Matthias On 11/21/17 2:14 PM, Puneet Lakhina wrote: > Hello, > > Im new to Kafka ecosystem so I apologize if this is all a naive que

Re: Time-Based Index for Consuming Messages Up to Certain Timestamp

2017-11-21 Thread Matthias J. Sax
This is possible, but I think you don't need the time-based index for it :) You will just buffer up all messages for a 5 minute sliding-window and maintain all message sorted by timestamp in this window. Each time the window "moves" you write the oldest records that "drop out" of the window to the

Re: Kafka Stream Exception: partition issue

2017-11-20 Thread Matthias J. Sax
Sound like Streams can't fetch the metadata completely. You can increase Consumer config `REQUEST_TIMEOUT_MS_CONFIG` to give more time to the cluster to broadcast the information to all brokers. https://docs.confluent.io/current/streams/developer-guide/config-streams.html#kafka-consumers-and-prod

Re: Standby Replicas with In Memory State Stores

2017-11-15 Thread Matthias J. Sax
Thanks! On 11/15/17 7:57 AM, Matt Farmer wrote: > Yes, in memory stores are backed by a changelog topic as far as I'm aware. > I have filed https://issues.apache.org/jira/browse/KAFKA-6214 > > On Tue, Nov 14, 2017 at 10:53 PM Matthias J. Sax > wrote: > >> Thank

Re: Standby Replicas with In Memory State Stores

2017-11-14 Thread Matthias J. Sax
Thanks for reporting. Sounds like a bug to me. Please file a Jira. Question: even if you use an In-Memory store, it's still backed by a changelog topic, right? -Matthias On 11/14/17 3:07 PM, Matt Farmer wrote: > Hey everyone, > > We ran across a little bit of a landmine in Kafka Streams 0.11.

Re: How do I gracefully handle stream joins where the other side never appears?

2017-11-14 Thread Matthias J. Sax
gt; works, but I'm just confirming. > > --Michael > > > > From: Matthias J. Sax > Sent: Friday, November 10, 2017 2:52 PM > To: users@kafka.apache.org > Subject: EXTERNAL: Re: How do I gracefully handle stream joins where the > other side never appears? >

Re: Kafka Streams question

2017-11-14 Thread Matthias J. Sax
htbend.com > https://www.lightbend.com/ > >> On Nov 14, 2017, at 11:42 AM, Matthias J. Sax wrote: >> >> Boris, >> >> I just realized, that you want to update the state from your processor >> -- this is actually not supported by a global state (at least not directly).

Re: Kafka Streams question

2017-11-14 Thread Matthias J. Sax
Boris, I just realized, that you want to update the state from your processor -- this is actually not supported by a global state (at least not directly). Global state is populated from a topic at startup, and the global thread should be the only thread that updates the state: even if it is techn

Re: Custom state store

2017-11-13 Thread Matthias J. Sax
hitect > boris.lublin...@lightbend.com > https://www.lightbend.com/ > >> On Nov 13, 2017, at 12:45 PM, Matthias J. Sax wrote: >> >> You can plug in a custom store via `Materialized` parameter that allows >> to specify a custom `KeyValueBytesStoreSupplier` (and others

Re: Custom state store

2017-11-13 Thread Matthias J. Sax
You can plug in a custom store via `Materialized` parameter that allows to specify a custom `KeyValueBytesStoreSupplier` (and others) -Matthias On 11/13/17 10:26 AM, Boris Lublinsky wrote: > >> On Nov 13, 2017, at 12:24 PM, Boris Lublinsky >> wrote: >> >> It looks like for the custom state st

Re: How do I gracefully handle stream joins where the other side never appears?

2017-11-10 Thread Matthias J. Sax
Messages that don't find a join partner are dropped. For each incoming message, we do the following: 1. insert it into it's window store 2. lookup other window store for matching record a) if matching records are found, compute join and emit Note, that we maintain all records in the window

Re: GlobalKTable never finishes restoring

2017-11-08 Thread Matthias J. Sax
the end of > the loop. That fixed the issue so may serve as further verification of your > hypothesis? > > In the meantime I suppose the workaround is to not produce transactional > messages to topics backing a GlobalKTable? > > Thanks > Alex > > > On Tue, N

Re: GlobalKTable never finishes restoring

2017-11-07 Thread Matthias J. Sax
gt; >> When people find this thread in mailing list archive, the attachment >> wouldn't be there. >> >> Thanks >> >> On Tue, Nov 7, 2017 at 8:32 AM, Matthias J. Sax >> wrote: >> >>> Alex, >>> >>> I am not sure, but mayb

Re: GlobalKTable never finishes restoring

2017-11-07 Thread Matthias J. Sax
Alex, I am not sure, but maybe it's a bug. I noticed that you read transaction data. Can you try to write to the topic without using transactions and/or set the consumer into READ_UNCOMMITTED mode to verify? It only a guess that it might be related to transactions and it would be great to verify o

Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Matthias J. Sax
Congrats!!! On 11/6/17 7:56 PM, Vahid S Hashemian wrote: > Congrats Onur! > > --Vahid > > > > From: Ismael Juma > To: d...@kafka.apache.org > Cc: "users@kafka.apache.org" > Date: 11/06/2017 10:13 AM > Subject:Re: [ANNOUNCE] New committer: Onur Karaman > Sent by:i

Re: Subsccribe

2017-11-06 Thread Matthias J. Sax
It's self-service: https://kafka.apache.org/contact -Matthias On 11/5/17 6:01 PM, Amit Malhotra wrote: > > signature.asc Description: OpenPGP digital signature

Re: Reg. Kafka transactional producer and consumer

2017-11-04 Thread Matthias J. Sax
Hi, this consumer log line indicates that there is an open/pending transaction (ie, neither committed nor aborted) and thus, the broker does not deliver the data to the consumer. -> highWaterMark = 5, but lastStableOffset = 0 On 11/2/17 5:25 AM, Abhishek Verma wrote: > 1871 [main] DEBUG org.apa

Re: Kafka Streams 0.11.0.1 - Task Assignment Stickiness Behavior

2017-11-04 Thread Matthias J. Sax
Matthias J. Sax wrote: >> >> Might be worth a try with 1.0.0 RC3 -- even if I doubt that much changes. >> >> Can you provide debug logs for your Kafka streams applications as well >> as brokers? This would help to dig into this. >> > > I searched, bu

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-10-25 Thread Matthias J. Sax
Thanks a lot for the KIP. Can we please move the discussion to the dev list? Thus, after fixing the KIP collision, just start a new DISCUSS thread. Thx. -Matthias On 10/25/17 4:20 PM, Ted Yu wrote: > Have you seen the email a moment ago from Onur which uses the same KIP > number ? > > Looks l

Re: Kafka Streams 0.11.0.1 - Task Assignment Stickiness Behavior

2017-10-24 Thread Matthias J. Sax
Might be worth a try with 1.0.0 RC3 -- even if I doubt that much changes. Can you provide debug logs for your Kafka streams applications as well as brokers? This would help to dig into this. -Matthias On 10/24/17 5:53 PM, Ted Yu wrote: > Eric: > I wonder if it is possible to load up 1.0.0 RC3 o

Re: Kafka Streams 0.11.0.1 - Task Assignment Stickiness Behavior

2017-10-24 Thread Matthias J. Sax
We had multiple Jira. I guess this one is the fix you are looking for: https://issues.apache.org/jira/browse/KAFKA-5152 -Matthias On 10/24/17 3:21 PM, Eric Lalonde wrote: >> >> Could it be, that the first KafkaStreams instance was still in status >> "rebalancing" when you started the second/thir

Re: Kafka Streams Hang While Committing

2017-10-24 Thread Matthias J. Sax
Well. When a commit a triggered, Streams need to flush all caches and flush all pending write of the producers. And as this happens on the same thread that does processing, there won't be any processing of new data until the commit is finished. So I guess, it is expected. -Matthias On 10/24/17

Re: Kafka Streams 0.11.0.1 - Task Assignment Stickiness Behavior

2017-10-24 Thread Matthias J. Sax
Hi, the issue you describe, that on a "fresh" restart, all tasks are assigned to the first thread is known, and the solution for it was to introduce the new broker config you mentioned. Thus, there is no config for 0.10.2.x brokers or Streams API to handle this case (that's why we introduced the n

Re: Process and punctuate contract

2017-10-23 Thread Matthias J. Sax
ered > in process. > If the process or punctuate step fails / stream instance restarts we would > like to reprocess the batch again > Is this a use case that it is intended for or should we use a normal consumer > instance? > > Toby > >> On 24 Oct 2017, at 2:27 AM,

Re: Process and punctuate contract

2017-10-23 Thread Matthias J. Sax
Committing is independent of process and/or punctuate. You can configure your Kafka Streams application commit interval to any value you like via `commit.interval.ms` parameter (default is 30 seconds). Thus, there is no guarantee when a commit exactly happens with regard to calling process and pu

Re: Micro-batching in Kafka streams - redux

2017-10-20 Thread Matthias J. Sax
Hi, as Kafka Streams focuses on stream processing, micro-batching is something we don't consider. Thus, nothing has changed/improved. About the store question: If you buffer up your writes in a store, you need to delete those value from the store later on to avoid that the store grown unbounded.

Re: Streams equivalent of Storm's Fields Grouping

2017-10-19 Thread Matthias J. Sax
Kafka Streams shared state base on key out-of-the-box and exploit horizontal scaling. This record redistribution happens automatically based on the key if required. You don't need to explicitly declare it. 1) if you change the key, AND 2) apply a key-based operation (like groupBy or join) afte

Re: Fwd: Can you please subscribe me in this project

2017-10-19 Thread Matthias J. Sax
It's self-service: https://kafka.apache.org/contact -Matthias On 10/17/17 11:51 AM, Nikhil Deore wrote: > Hi, > > I want to learn and contribute to this project, > Please subscribe me in. > > Thanks, > Nikhil > signature.asc Description: OpenPGP digital signature

Re: Getting started with stream processing

2017-10-11 Thread Matthias J. Sax
ling window, but a tumbling window is defined by its length. So you > get information for the last hour that has passed, but that last hour is > a window of NOW - 1 hour. How do I get a window to align to hours of the > clock? > > > > On 10/10/2017 19:41, Matthias

Re: Kafka 0.10: Two Kafka Consumers, two different topics, same group ID - commit exception occurs

2017-10-10 Thread Matthias J. Sax
Why do you use the same groupId? This sound not correct. You would use a consumer group to share load of a single topic based on partitions. Ie. if a topic has multiple partitions, different partitions are processed by different consumer within the same group. But in your case, the second process

Re: kafka-streams dying if can't create internal topics

2017-10-10 Thread Matthias J. Sax
where i can file an > issue? > > Thanks again. > > On Tue, Oct 10, 2017 at 8:38 PM, Matthias J. Sax > wrote: > >> Yes, please file a Jira. We need to fix this. Thanks a lot! >> >> -Matthias >> >> On 10/10/17 5:24 AM, Dmitriy Vsekhvalnov wrote: >

Re: Getting started with stream processing

2017-10-10 Thread Matthias J. Sax
Hi, if the aggregation returns a different type, you can use .aggregate(...) instead of .reduce(...) Also, for you time based computation, did you consider to use windowing? -Matthias On 10/10/17 6:27 AM, RedShift wrote: > Hi all > > Complete noob with regards to stream processing, this is my

Re: kafka-streams dying if can't create internal topics

2017-10-10 Thread Matthias J. Sax
Yes, please file a Jira. We need to fix this. Thanks a lot! -Matthias On 10/10/17 5:24 AM, Dmitriy Vsekhvalnov wrote: > Hi all, > > still doing disaster testing with Kafka cluster, when crashing several > brokers at once sometimes we observe exception in kafka-stream app about > inability to cre

Re: Serve interactive queries from standby replicas

2017-10-10 Thread Matthias J. Sax
stent reads might be a reasonable trade off if >> you >>> can get ability to serve reads without downtime in some cases. >>> >>> By the way standby replicas are just extra consumers/processors of input >>> topics? Or is there some custom protocol for si

Re: Add Kafka user list

2017-10-10 Thread Matthias J. Sax
If you want to subscribe follow instructions here: http://kafka.apache.org/contact On 10/10/17 2:07 AM, shawnding(丁晓坤) wrote: > Add Kafka user list > signature.asc Description: OpenPGP digital signature

Re: subscription

2017-10-09 Thread Matthias J. Sax
See http://kafka.apache.org/contact On 10/9/17 8:27 AM, Emanuele Ianni wrote: > subscription > signature.asc Description: OpenPGP digital signature

Re: Serve interactive queries from standby replicas

2017-10-06 Thread Matthias J. Sax
if you > can get ability to serve reads without downtime in some cases. > > By the way standby replicas are just extra consumers/processors of input > topics? Or is there some custom protocol for sinking the state? > > > > fre 6 okt. 2017 kl. 20:03 skrev Matthias J.

Re: Compacted topic, entry deletion, kafka 0.11.0.0

2017-10-06 Thread Matthias J. Sax
Setting topic policy to "compact,delete" should be sufficient. Cf. https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist Note: retention time is not based on wall-clock time, but embedded record timestamps. Thus, old messages get only deleted if

Re: Serve interactive queries from standby replicas

2017-10-06 Thread Matthias J. Sax
No, that is not possible. Note: standby replicas might "lag" behind the active store, and thus, you would get different results if querying standby replicas would be supported. We might add this functionality at some point though -- but there are no concrete plans atm. Contributions are always we

Re: Scala API

2017-10-06 Thread Matthias J. Sax
The new clients (producer/consumer/admin) as well as Connect and Streams API are only available in Java. You can use Streams API with Scala though. There is one thing you need to consider: https://docs.confluent.io/current/streams/faq.html#scala-compile-error-no-type-parameter-java-defined-trait-i

Re: Correct way to increase replication factor for kafka-streams internal topics

2017-10-04 Thread Matthias J. Sax
That is hard to do... Just deleting the topic might result in data loss, if not all data was processed by the application yet (note, that repartitioning topics are also kind of a buffer between subtopologies). Just manually changing the number of partitions via kafka-topics.sh will break partitio

Re: out of order sequence number in exactly once streams

2017-09-29 Thread Matthias J. Sax
> managed internally and not exposed through streamconfig. >> >> https://kafka.apache.org/0110/documentation/#streamsconfigs >> >> -Sameer. >> >> On Thu, Sep 28, 2017 at 12:12 AM, Matthias J. Sax >> wrote: >> >>> An OutOfOrderSeque

Re: windowed store excessive memory consumption

2017-09-28 Thread Matthias J. Sax
Thanks! On 9/28/17 2:19 AM, Stas Chizhov wrote: > Sure. Here we go: https://issues.apache.org/jira/browse/KAFKA-5985 > > 2017-09-28 0:23 GMT+02:00 Matthias J. Sax : > >>>> I have a feeling that it would be helpful to add this to documentation >>>> examples

Re: windowed store excessive memory consumption

2017-09-27 Thread Matthias J. Sax
>> I have a feeling that it would be helpful to add this to documentation >> examples as well as javadocs for all methods that do return iterators. That makes sense. Can you create a JIRA for this? Thanks. -Matthias On 9/27/17 2:54 PM, Stas Chizhov wrote: > Thanks, that comment actually mad its

Re: how to use Confluent connector with Apache Kafka

2017-09-27 Thread Matthias J. Sax
All connectors are compatible with vanilla AK, as Confluent Open Source ships with "plain" Apache Kafka under the hood. So you can just download the connector, plug it in, and configure it as any other connector, too. https://www.confluent.io/product/connectors/ -Matthias On 9/26/17 1:15 PM, M

Re: out of order sequence number in exactly once streams

2017-09-27 Thread Matthias J. Sax
An OutOfOrderSequenceException should only occur if a idempotent producer gets out of sync with the broker. If you set `enable.idempotence = true` on your producer, you might want to set `retries = Integer.MAX_VALUE`. -Matthias On 9/26/17 11:30 PM, Sameer Kumar wrote: > Hi,  > > I again received

Re: why punctuate fun run more times than schedule

2017-09-21 Thread Matthias J. Sax
punctuations are event-time based, not wall-clock time base. We add wall-clock based punctuations to next release thought. Cf. https://docs.confluent.io/current/streams/developer-guide.html#defining-a-stream-processor -Matthias On 9/21/17 8:01 PM, 805930...@qq.com wrote: > this is a kafka stre

Re: Hi

2017-09-21 Thread Matthias J. Sax
cc'ed Daniele :) On 9/21/17 1:59 PM, Ted Yu wrote: > Please follow instructions on http://kafka.apache.org/contact > > On Thu, Sep 21, 2017 at 1:30 PM, Daniele Ascione > wrote: > >> hi, I would like to subscribe >> > signature.asc Description: OpenPGP digital signature

Re: Kafka Internals Video/Blog

2017-09-20 Thread Matthias J. Sax
Check out the Kafka wiki: https://cwiki.apache.org/confluence/display/KAFKA/Index It contains many design and discussion pages -- it's sometime a little hidden, but I am sure you can find them. -Matthias On 9/20/17 10:18 AM, M. Manna wrote: > Raghav, > > I would say Kafka documentation on Kafk

Re: Kafka InvalidStateStoreException

2017-09-15 Thread Matthias J. Sax
Hi, in case of a rebalance, partitions are reassigned and thus (shards) of a store might move from one instance/thread to another. This could potentially happen anytime, and you need to rediscover the shard/store afterwards. Thus, your code must catch this exception and you can retry the query aft

Re: State store and input topics with different number of partition

2017-09-01 Thread Matthias J. Sax
Your observation is correct. Kafka Streams creates a task per partition. As you have a shared state store over two operator, the tasks of both input streams need to be merged to ensure co-partitioning. Thus, task0 reads topic1 partition0 and topic2 partion0, and all other task[123] only topic1 par

Re: Writing streams to kafka topic

2017-09-01 Thread Matthias J. Sax
Hi, this is not supported by the DSL layer. What you would need to do, is to add a custom stateful transform() operator after there window (`stream.groupByKey().aggregate().toStream().transform().to()`), that buffers the output and remembers the latest result. Second, you would schedule a punctuat

Re: Kafka Streams: Pseudo Wallclock Punctuate

2017-08-25 Thread Matthias J. Sax
Eli, One think you could do, is to send "tick tuples" through your topology and use WallclockTimestampExtractor. It's not a nice solution, but I don't have any better idea atm. -Matthias On 8/24/17 9:37 PM, Eli Jordan wrote: > Update on this. Modifying the state store on another thread actually

Re: kafka straming progress had be done a few minutes later

2017-08-23 Thread Matthias J. Sax
Not sure what your question is... Maybe you refer to commit interval that is 30 seconds by default. It could be, that you don't see any writes to the output topic before that. But it's a wild guess. You can try to set a shorter commit interval via StreamsConfig. -Matthias On 8/22/17 8:09 PM, 杰

Re: Can we use builder.stream twice in single application

2017-07-28 Thread Matthias J. Sax
You can do both in a single application via KStream input = builder.stream("topic"); input.to("output-1"); input.to("output-2"); In general, if you reuse a KStream or KTable and apply multiple operators (in the example about, two `to()` operators), the input will be duplicated and sent to each op

Re: How do I setup Kafka Streams application to have one task per partition?

2017-07-27 Thread Matthias J. Sax
Using `PartitionGrouper` is correct. As you mentioned correctly, Stream scales via "max number of partitions" and thus, be default only create one task for this case. Another way would be, to deploy multiple streams applications each processing a different topic. Of course, for this you will need

Re: Kafka Streams internal repartitioning topics, retention time, and reprocessing old data

2017-07-27 Thread Matthias J. Sax
Hi, the behavior you describe is by design. You should increase the retention time of the re-partitioning topics manually to process old data. -Matthias On 7/25/17 7:17 AM, Gerd Behrmann wrote: > Hi, > > While adding a new Streams based micro service to an existing Kafka > infrastructure, I h

Re: Merging windowed KStreams with regular KStreams

2017-07-26 Thread Matthias J. Sax
I guess it depend what you want as an output... But what you suggest would work. You can also apply a .map() to the windowed stream and extract the actual record key from the window (ie, strip away the window) -Matthias On 7/26/17 6:15 PM, Sameer Kumar wrote: > I wanted to merge two KStreams on

Re: Merging Two KTables

2017-07-26 Thread Matthias J. Sax
Merging two tables does not make too much sense because each table might contain an entry for the same key. So it's unclear, which of both values the merged table should contain. KTable.toStream() is just a semantic change and has no runtime overhead. -Matthias On 7/26/17 1:34 PM, Sameer Kumar

Re: Joining two topics and emitting each key only once within a sliding window.

2017-07-22 Thread Matthias J. Sax
I am not sure exactly what semantics you want to have. Note, that Kafka Streams provides a sliding window join between two stream. Thus, I am not sure what you mean by >> Track when matches are found so subsequent matches (which within the join >> window would be considered duplicates) aren't reem

Re: Kafka Streams: why aren't offsets being committed?

2017-07-21 Thread Matthias J. Sax
My guess is that offsets are committed only when all tasks in the >>> topology have received input. Is this what's happening? No. Task offsets are committed independently from each other. You can you double check the logs in DEBUG mode. It indicates when offsets get committed. Also chec

Re: Get data from old offsets

2017-07-20 Thread Matthias J. Sax
Did you try setting `auto.offset.reset` to "earliest" ? -Matthias On 7/18/17 8:44 PM, Yuri da Costa Gouveia wrote: > Hello, > I am having trouble to get the data from old offsets. I'm using the version > 0.10.2.1, and I need any assistance to recover this data. > This is my consumer class: > >

Re: DAG processing in Kafka Streams

2017-07-20 Thread Matthias J. Sax
Sameer, the optimization you describe applies to batch processing but not to stream processing. As you mentioned: "will traverse the data only once". This property is interesting in batch processing only, as it means that the data is only read from disk once and both map operations are applies d

Re: IllegalStateException with custom state store ..

2017-07-20 Thread Matthias J. Sax
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-WhydoIgetanIllegalStateExceptionwhenaccessingrecordmetadata? -Matthias On 7/1/17 8:13 PM, Debasish Ghosh wrote: > Just to give some more information, the ProcessorContext that gets passed > to the init method of the custom store has a null

Re: How can we re-key and re-partition records based on that key

2017-07-16 Thread Matthias J. Sax
If you only want to change the key, you can use #selectKey() -- if you want to change key and value, you can use #map(). Stream will automatically repartition the data afterwards if required (ie, if you do a group-by or join). If you want to force repartitioning, you can just call #through() after

Re: State management & restore functionality

2017-07-15 Thread Matthias J. Sax
Streams does use one changelog topic per store (not just a single global changelog topic per application). Thus, the number of partitions can be different for different stores/changelog topics within one application. About partitions assignment: It depends a little bit on the structure of your pro

Re: Two consumer groups for same topic . One with subscribe and other with manual assignment

2017-07-15 Thread Matthias J. Sax
It will not interfere. And this is independent of manual partition assignment or topic subscription. If you have different consumer group-ids it's independent of each other. -Matthias On 7/12/17 11:21 PM, venkata sastry akella wrote: > Hi > Can I have a one consumer group with automatic subcrip

Re: "Failed to update metadata" in integration test for exactly once stream and custom store

2017-07-12 Thread Matthias J. Sax
tTopic, so I would expect it to verify that it actually made it > to outputTopic. > > > 2017-07-11 16:25 GMT-06:00 Matthias J. Sax : >> Seems Streams cannot connect (or looses connection) to the brokers. Not >> sure why. >> >> You can also have a look here for

Re: "Failed to update metadata" in integration test for exactly once stream and custom store

2017-07-11 Thread Matthias J. Sax
Seems Streams cannot connect (or looses connection) to the brokers. Not sure why. You can also have a look here for our own EOS Streams integration test: https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/EosIntegrationTest.java -Matthias On 7

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-07-09 Thread Matthias J. Sax
the code then would also save a ton. (We have the defaults one in conf > why not override the specific ones?) > > Does this makes sense to people? what pieces should i outline with code > (time is currently sparse :( but I can pull of some smaller examples i > guess) > > Best Ja

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-07-07 Thread Matthias J. Sax
we should rule out config based decisions say configs >>>> like >>>>> streams.$applicationID.joins.$joinname.conf = value >>>>> >>>> Is this just for config? Or are you suggesting that we could somehow >>>> "code" >>>> the join in a conf

Re: enable.idempotence=true and Retriable exceptions

2017-07-02 Thread Matthias J. Sax
I you set `enable.idempotence=true`, the producer will retry internally. Thus, it should never throw an retriable exception in the first place (so there is nothing to ignore :)). -Matthias On 7/2/17 6:28 PM, Gary Struthers wrote: > I currently catch and retry Retriable exceptions. The v11 docs s

Re: kafka-streams repeatedly rebalances on start up

2017-06-26 Thread Matthias J. Sax
Great. Thanks a lot for confirming! :) -Matthias On 6/26/17 4:58 AM, Tom Dearman wrote: > Hi Matthias, > > This problem seems to be fixed in 0.11.0.0 client. > > Thanks, > > Tom >> On 17 Jun 2017, at 01:11, Matthias J. Sax wrote: >> >> Hi Tom, >&g

Re: Single Key Aggregation

2017-06-23 Thread Matthias J. Sax
g this further as well, if you could tell me the > classes that I should be looking into. > > -Sameer. > > On Tue, Jun 20, 2017 at 3:51 AM, Matthias J. Sax > wrote: > >> Hi Sameer, >> >> With regard to >> >>>>> What I saw was th

Re: Aggregation operations and Joins not working as I would expect.

2017-06-23 Thread Matthias J. Sax
this something similar that can be done in KStreams. > > Thanks. > > Regards, > Daniel > -- > DdC > > > > > > On 6/22/17, 10:03 PM, "Matthias J. Sax" wrote: > >> Hi, >> >> there are two things: >> >> 1) ag

Re: Aggregation operations and Joins not working as I would expect.

2017-06-22 Thread Matthias J. Sax
Hi, there are two things: 1) aggregation operator produce an output record each time the aggregate is is updates. Thus, you would get 6 record in you example. At the same time, we deduplicate consecutive outputs with an internal cache. And the cache is flushed non-mechanistically (either partly f

Re: Kafka Streams: Problems implementing rate limiting processing step

2017-06-20 Thread Matthias J. Sax
to say it inspects a state store, sends the messages > that should be sent and removes them from the store. I might have read > too much out of it though. > > Cheers, > > Michał > > > On 20/06/17 17:59, Matthias J. Sax wrote: >>>> I didn't know you could

Re: Kafka Streams: Problems implementing rate limiting processing step

2017-06-20 Thread Matthias J. Sax
>> I didn't know you could write to state stores from outside a >> processor/transformer. You can't. And as far as I understand this thread, nobody said you can. Did I miss something? -Matthias On 6/20/17 1:02 AM, Michal Borowiecki wrote: > I didn't know you could write to state stores from ou

Re: IllegalStateException when putting to state store in Transformer implementation

2017-06-19 Thread Matthias J. Sax
Thanks for sharing your thoughts. I am not sure though, what section you mean. IIRC, we don't cover the supplier pattern in the docs at all. So where do you think, we should add this one liner (happy to add it if I know where :)). -Matthias On 6/16/17 2:48 PM, Adrian McCague wrote: > Guozhang,

Re: Single Key Aggregation

2017-06-19 Thread Matthias J. Sax
ng. >> >> What I saw was that while on Machine1, the counter was 100 , another >> machine it was at 1. I saw it as inconsistent. >> >> >> -Sameer. >> >> On Fri, Jun 16, 2017 at 10:47 PM, Matthias J. Sax >> wrote: >> >>

Re: kafka-streams repeatedly rebalances on start up

2017-06-16 Thread Matthias J. Sax
Hi Tom, Thanks a lot for reporting this. We dug into it. It's easy to reproduce (thank a lot to describe a simple way to do that) and it seems to be a bug in Streams... I did open a JIRA: https://issues.apache.org/jira/browse/KAFKA-5464 For using Streams 0.10.2.1, there is nothing we can advice a

<    3   4   5   6   7   8   9   10   11   12   >