Re: Is an in-memory stream-stream join possible?

2019-09-05 Thread Dmitry Minkovsky
Whoops! Just after sending this email I saw https://issues.apache.org/jira/browse/KAFKA-4729. On Thu, Sep 5, 2019 at 11:04 AM Dmitry Minkovsky wrote: > Hello! > > When I saw KAFKA-4730/KIP-428 ("Streams does not have an in-memory > windowed store") were included in 2.3

Is an in-memory stream-stream join possible?

2019-09-05 Thread Dmitry Minkovsky
Hello! When I saw KAFKA-4730/KIP-428 ("Streams does not have an in-memory windowed store") were included in 2.3.0, I thought maybe it would be possible to configure an in-memory stream-stream join using KStream#join. But I am not seeing this in the API. My use case is joining streams of short-live

Warning when adding GlobalKTable to toplogy

2019-01-19 Thread Dmitry Minkovsky
When I add a GlobalKTable for topic "message-write-service-user-ids-by-email" to my topology, I get this warning: [2019-01-19 12:18:14,008] WARN (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:421) [Consumer clientId=message-write-service-55f2ca4d-0efc-4344-90d3-955f9f5a65fd-Strea

Re: High end-to-end latency with processing.guarantee=exactly_once

2019-01-03 Thread Dmitry Minkovsky
s.Sender.run(Sender.java:309) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at java.lang.Thread.run(Thread.java:748) On Thu, Jan 3, 2019 at 7:34 AM Dmitry Minkovsky wrote: > Hi Matthias, > > I get these errors even on reprocessing, when data is flowing full > throttle throug

Re: High end-to-end latency with processing.guarantee=exactly_once

2019-01-03 Thread Dmitry Minkovsky
issues.apache.org/jira/browse/KAFKA-6150) > resulting in lost producer state for those topics :( > > > -Matthias > > On 12/20/18 3:18 AM, Dmitry Minkovsky wrote: > > Also, I have read through that issue and KIP-360 to the extent my > knowledge > > allows and I don

Re: High end-to-end latency with processing.guarantee=exactly_once

2018-12-19 Thread Dmitry Minkovsky
with the calls seconds apart. On Wed, Dec 19, 2018 at 9:12 PM Dmitry Minkovsky wrote: > Hello 王美功, > > I am using 2.1.0. And, I think you nailed it on the head, because my > application is low throughput and I am seeing UNKNOWN_PRODUCER_ID all the > time with exactly once enabled. I

Re: High end-to-end latency with processing.guarantee=exactly_once

2018-12-19 Thread Dmitry Minkovsky
Hello 王美功, I am using 2.1.0. And, I think you nailed it on the head, because my application is low throughput and I am seeing UNKNOWN_PRODUCER_ID all the time with exactly once enabled. I've googled this before but couldn't identify the cause. Thank you! Setting retry.backoff.ms to 5 brought the

High end-to-end latency with processing.guarantee=exactly_once

2018-12-19 Thread Dmitry Minkovsky
I have a process that spans several Kafka Streams applications. With the streams commit interval and producer linger both set to 5ms, when exactly once delivery is disabled, this process takes ~250ms. With exactly once enabled, the same process takes anywhere from 800-1200ms. In Enabling Exactly-O

MAX_TASK_IDLE_MS_CONFIG not working?

2018-12-13 Thread Dmitry Minkovsky
I just discovered that 2.1.0 included MAX_TASK_IDLE_MS_CONFIG and dropped everything to play with it! Exciting! I built the following topology, attempting to synchronize two topic-partitions: https://gist.github.com/dminkovsky/45aa29aefefad564f9663cd36ad21ce1. However, I keep failing at https://g

Re: Streams: Why is ForwardingCacheFlushListener internal? Can I replicate its functionality in my code?

2018-02-05 Thread Dmitry Minkovsky
I was somehow not aware of this: https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams ... :/ On Thu, Feb 1, 2018 at 11:57 PM, Dmitry Minkovsky wrote: > Thank you Guozhang. > > > related to your consistency requirement of the s

Re: Shouldn't KStream#through let you specify a Consumed?

2018-02-02 Thread Dmitry Minkovsky
Streams API sets the metadata record timestamp on write, and > thus, using the default timestamp extractor ensures that this timestamp > is correctly reused when reading data back. > > > > -Matthias > > On 2/2/18 3:54 PM, Dmitry Minkovsky wrote: > > `KStream#through()`

Shouldn't KStream#through let you specify a Consumed?

2018-02-02 Thread Dmitry Minkovsky
`KStream#through()` currently lets you specify a `Produced`. Shouldn't it also let you specify a `Consumed`. This would let you specify a time stamp extractor.

Re: Streams: Why is ForwardingCacheFlushListener internal? Can I replicate its functionality in my code?

2018-02-01 Thread Dmitry Minkovsky
ding the > `flush()` function of the state store, that after the flush, send the whole > key-value map entries to downstream. > > > Guozhang > > > > On Thu, Feb 1, 2018 at 2:10 AM, Dmitry Minkovsky > wrote: > > > Right, but I want to forward messages to down

Re: Streams: Why is ForwardingCacheFlushListener internal? Can I replicate its functionality in my code?

2018-02-01 Thread Dmitry Minkovsky
ble caching, you get the KTable behavior out > of the box. No need to write any custom code in the processor itself. > > > StoreBuilder builder = > Stores.keyValueStoreBuilder(...).withCachingEnabled(); > > topology.addStateStore(builder, ...) > > > > -Matthias >

Streams: Why is ForwardingCacheFlushListener internal? Can I replicate its functionality in my code?

2018-01-31 Thread Dmitry Minkovsky
I am writing a processor and I want its stores to behave like KTables: For consistency, I don't want to forward values until the stores have been flushed. I am looking at `ForwardingCacheFlushListener` and see that it is using `InternalProcessorContext` to change the current node, perform a forwar

Group consumer cannot consume messages if kafka service on specific node in test cluster is down

2018-01-31 Thread Dmitry Minkovsky
This is not my question, but I saw it on Stack Overflow yesterday and have been wondering about it: https://stackoverflow.com/questions/48523972/group- consumer-cannot-consume-messages-if-kafka-service-on-specific-node-in-test. Anyone else seen behavior like this?

Re: deduplication strategy for Kafka Streams DSL

2018-01-25 Thread Dmitry Minkovsky
on in the flatMap do? Would it > possibly generates multiple records for the follow-up aggregation for each > input? > > > Guozhang > > > On Thu, Jan 25, 2018 at 6:54 AM, Dmitry Minkovsky > wrote: > > > You may not be surprised that after further investigation it t

Re: Best practices Partition Key

2018-01-25 Thread Dmitry Minkovsky
can use Kafka Streams Interactive Queries to get data.. On Thu, Jan 25, 2018 at 10:02 AM, Dmitry Minkovsky wrote: > > one entity - one topic, because I need to ensure the properly ordering > in the events. > > This is a great in insight. I discovered that keeping entity-related >

Re: Best practices Partition Key

2018-01-25 Thread Dmitry Minkovsky
> one entity - one topic, because I need to ensure the properly ordering in the events. This is a great in insight. I discovered that keeping entity-related things on one topic is much easier than splitting entity-related things onto multiple topics. If you have one topic, replaying that topic is

Re: deduplication strategy for Kafka Streams DSL

2018-01-25 Thread Dmitry Minkovsky
You may not be surprised that after further investigation it turns out this was related to some logic in my topology. On Wed, Jan 24, 2018 at 5:43 PM, Dmitry Minkovsky wrote: > Hi Gouzhang, > > Here it is: > > topology.stream(MAILBOX_OPERATION_REQUESTS, > Consumed.wi

Re: deduplication strategy for Kafka Streams DSL

2018-01-24 Thread Dmitry Minkovsky
ood way to > re-produce it? > > > Guozhang > > > On Wed, Jan 24, 2018 at 11:50 AM, Dmitry Minkovsky > wrote: > > > Oh I'm sorry—my situation is even simpler. I have a KStream -> group by > -> > > reduce. It emits duplicate key/value/timestamps (i.e

Re: deduplication strategy for Kafka Streams DSL

2018-01-24 Thread Dmitry Minkovsky
Oh I'm sorry—my situation is even simpler. I have a KStream -> group by -> reduce. It emits duplicate key/value/timestamps (i.e. total duplicates). On Wed, Jan 24, 2018 at 2:42 PM, Dmitry Minkovsky wrote: > Can someone explain what is causing this? I am experiencin

Re: deduplication strategy for Kafka Streams DSL

2018-01-24 Thread Dmitry Minkovsky
Can someone explain what is causing this? I am experiencing this too. My `buffered.records.per.partition` and `cache.max.bytes.buffering` are at their default values, so quite substantial. I tried raising them but it had no effect. On Wed, Dec 13, 2017 at 7:00 AM, Artur Mrozowski wrote: > Hi > I

What is the best way to re-key a KTable?

2018-01-23 Thread Dmitry Minkovsky
KStream has a simple `#selectKey()` method, but it appears the only way to re-key a KTable is by doing `.toStream(mapper).groupByKey().reduce()`. Is this correct? I'm guessing this is because an attempt to re-key a table might result in multiple values at the new key.

Re: Merging Two KTables

2018-01-23 Thread Dmitry Minkovsky
> Merging two tables does not make too much sense because each table might contain an entry for the same key. So it's unclear, which of both values the merged table should contain. Which of both values should the table contain? Seems straightforward: it should contain the value with the highest ti

Re: Kafka Streams topology does not replay correctly

2018-01-18 Thread Dmitry Minkovsky
ut for how long > to block/delay? -- Blocking could have large impact on overall behavior). > > The detail are still under discussion... Hope this helps. > > > -Matthias > > On 1/17/18 9:10 AM, Dmitry Minkovsky wrote: > > I have read through the source, following StreamThre

Re: Kafka Streams topology does not replay correctly

2018-01-17 Thread Dmitry Minkovsky
certainly I will find ways to make adjustments to make this re-processable. But it would be cool if this was a first-class things in the framework. It seems like it would require the user to specify the runtime dependencies between streams. On Wed, Jan 17, 2018 at 8:25 AM, Dmitry Minkovsky wr

Re: Kafka Streams topology does not replay correctly

2018-01-17 Thread Dmitry Minkovsky
r KStream-KTable join, too. If might > happen hat we get 100 KTable records that we all process before we > receive 100 KStream records. For the correct result it might be required > to get 50 KTable and 50 KStream in the first poll call and the rest in > the second. But we don't know

Re: Kafka Streams topology does not replay correctly

2018-01-16 Thread Dmitry Minkovsky
I meant “Thanks, yes I will try replacing...” вт, 16 янв. 2018 г. в 22:12, Dmitry Minkovsky : > Thanks, yes try replacing the KStream-KTable joins with > KStream#transform()s and a store. Not sure why you mean I’d need to buffer > multiple records. The KStream has incoming events, and #

Re: Kafka Streams topology does not replay correctly

2018-01-16 Thread Dmitry Minkovsky
o perform the > join correctly. It's not trivial but also also not rocket-science. > > If we need stronger guarantees, it's the best way to follow though atm, > until we have addressed those issues. Planned for 1.2.0 release. > > -Matthias > > > On 1/16/18 5:34

Re: Kafka Streams topology does not replay correctly

2018-01-16 Thread Dmitry Minkovsky
Right now I am thinking of re-writing anything that has these problematic KStream/KTable joins as KStream#transform() wherein the state store is manually used. Does that makes sense as an option for me? -Dmitry On Tue, Jan 16, 2018 at 6:08 PM, Dmitry Minkovsky wrote: > Earlier today I pos

Kafka Streams topology does not replay correctly

2018-01-16 Thread Dmitry Minkovsky
Earlier today I posted this question to SO : > I have a topology that looks like this: KTable users = topology.table(USERS, Consumed.with(byteStringSerde, userSerde), Materialized.as(USERS));

Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2018-01-16 Thread Dmitry Minkovsky
correctly (minus some know issues as mentioned above that we > are going to fix in future releases). > > > Stateless: I mean, if you write a program that only uses stateless > operators like filter/map but not aggregation/joins. > > > > -Matthia

Re: Mistakes in documentation?

2017-12-18 Thread Dmitry Minkovsky
ot;Table" was pasted without edit from the one previously > > that pertained to "KStream". > > > > On Sun, Dec 17, 2017 at 5:31 PM, Dmitry Minkovsky > > wrote: > > > >> On https://docs.confluent.io/current/streams/developer- > guide/dsl-a

Re: Mistakes in documentation?

2017-12-17 Thread Dmitry Minkovsky
JIRA, but I couldn't find where. Maybe I don't have privileges. Dmitry On Sun, Dec 17, 2017 at 5:31 PM, Dmitry Minkovsky wrote: > On https://docs.confluent.io/current/streams/developer-guide/dsl-api.html > for version 4.0.0: > > Under "Table", currently: > > &g

Mistakes in documentation?

2017-12-17 Thread Dmitry Minkovsky
On https://docs.confluent.io/current/streams/developer-guide/dsl-api.html for version 4.0.0: Under "Table", currently: > In the case of a KStream, the local KStream instance of every application instance will be populated with data from only a subset of the partitions of the input topic. Collecti

Re: Why is compression disabled by default?

2017-12-12 Thread Dmitry Minkovsky
gle one. > > Ismael > > On Tue, Dec 12, 2017 at 6:45 PM, Dmitry Minkovsky > wrote: > > > I would like to follow up with this more concrete question: > > > > For the purpose of achieving decent compression ratios, I am having > > difficulty finding informat

Re: Why is compression disabled by default?

2017-12-12 Thread Dmitry Minkovsky
received from the producer? Or can it do its own batching to improve compression? Thank you, Dmitry On Sun, Dec 10, 2017 at 7:44 PM, Dmitry Minkovsky wrote: > This is hopefully my final question for a while. > > I noticed that compression is disabled by default. Why is this? My best > g

Why is compression disabled by default?

2017-12-10 Thread Dmitry Minkovsky
This is hopefully my final question for a while. I noticed that compression is disabled by default. Why is this? My best guess is that compression doesn't work well for short messages , which was maybe identified as the maj

Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-10 Thread Dmitry Minkovsky
> are going to fix in future releases). > > > Stateless: I mean, if you write a program that only uses stateless > operators like filter/map but not aggregation/joins. > > > > -Matthias > > > On 12/9/17 11:59 AM, Dmitry Minkovsky wrote: > >> How lar

Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-09 Thread Dmitry Minkovsky
> How large is the record buffer? Is it configurable? I seem to have just discovered this answer to this: buffered.records.per.partition On Sat, Dec 9, 2017 at 2:48 PM, Dmitry Minkovsky wrote: > Hi Matthias, yes that definitely helps. A few thoughts inline below. > > Thank you! >

Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-09 Thread Dmitry Minkovsky
> > Stateless in what sense? Kafka Streams seems to be all about aligning and manipulating state to create more state. Are you referring to internal state, specifically? > > Hope this helps. > > > -Matthias > > > > On 12/8/17 11:02 AM, Dmitry Minkovsky wrote: > &g

How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-08 Thread Dmitry Minkovsky
I am about to put a topology into production and I am concerned that I don't know how to repartition/rebalance the topics in the event that I need to add more partitions. My inclination is that I should spin up a new cluster and run some kind of consumer/producer combination that takes data from t

Re: Configuration: Retention and compaction

2017-12-08 Thread Dmitry Minkovsky
opics is the > > delete.retention.ms > > The duration that tombstones for deletes will be kept in the topic > > during compaction. > > > > A very detail explaination on what is going on can be found here: > > > > https://kafka.apache.org/documentation/#compac

Configuration: Retention and compaction

2017-12-03 Thread Dmitry Minkovsky
This is a pretty stupid question. Mostly likely I should verify these by observation, but really I want to verify that my understanding of the documentation is correct: Suppose I have topic configurations like: retention.ms=$time cleanup.policy=compact My questions are: 1. After $time, any

Re: Cannot write to key value store provided by ProcessorTopologyTestDriver

2017-08-21 Thread Dmitry Minkovsky
, 2017 at 12:59 PM, Dmitry Minkovsky wrote: > I am trying to `put()` to a KeyValueStore that I got from > ProcessorTopologyTestDriver#getKeyValueStore() as part of setup for a > test. The JavaDoc endorses this use-case: > > * This is often useful in test cases to pre-populate t

Cannot write to key value store provided by ProcessorTopologyTestDriver

2017-08-21 Thread Dmitry Minkovsky
I am trying to `put()` to a KeyValueStore that I got from ProcessorTopologyTestDriver#getKeyValueStore() as part of setup for a test. The JavaDoc endorses this use-case: * This is often useful in test cases to pre-populate the store before the test case instructs the topology to * {@link

Re: Kafka Streams: why aren't offsets being committed?

2017-08-07 Thread Dmitry Minkovsky
Hi Garrett, This one https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-5510 Best, Dmitry пн, 7 авг. 2017 г. в 14:22, Garrett Barton : > Dmitry, which KIP are you referring to? I see this behavior too sometimes. > > On Fri, Aug 4, 2017 at 10:25 AM, Dmitry Minkovsky

Kafka Streams: retention and stream replay

2017-08-07 Thread Dmitry Minkovsky
One of the most appealing features of the streams-based architecture is the ability to replay history. This concept was highlighted in a blog post [0] just the other day. Practically, though, I am stuck on the mechanics of replaying data when that data is also periodically expiring. If your logs e

Re: Kafka Streams: why aren't offsets being committed?

2017-08-04 Thread Dmitry Minkovsky
are committed independently from each other. > > You can you double check the logs in DEBUG mode. It indicates when > offsets get committed. Also check via `bin/kafka-consumer-groups.sh` > what offsets are committed (application.id == group.id) > > Hope this helps. >

Re: Kafka Streams: why aren't offsets being committed?

2017-07-20 Thread Dmitry Minkovsky
you have a rough idea of how long? What is the value of your > "auto.offset.reset" configuration? > > Thanks, > Bill > > On Thu, Jul 20, 2017 at 6:03 PM, Dmitry Minkovsky > wrote: > > > My Streams application is configured to commit offsets

Kafka Streams: why aren't offsets being committed?

2017-07-20 Thread Dmitry Minkovsky
My Streams application is configured to commit offsets every 250ms: Properties streamsConfig = new Properties(); streamsConfig.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 250); However, every time I restart my application, records that have already been processed are re-processe

Re: Kafka Streams: "subscribed topics are not assigned to any members in the group"

2017-06-01 Thread Dmitry Minkovsky
bug to me. Can you share some more details. What is > your program structure? How many partitions so you have per topic? How > many threads/instances to you run? > > When does the issue occur exactly? > > > -Matthias > > On 5/23/17 12:26 PM, Dmitry Minkovsky wrote: > &

Kafka Streams: "subscribed topics are not assigned to any members in the group"

2017-05-23 Thread Dmitry Minkovsky
Certain elements of my streams app stopped working and I noticed that my logs contain: [2017-05-23 15:23:09,274] WARN (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:361) The following subscribed topics are not assigned to any members in the group user-service : [user-service-KSTR

Re: Producer does not fetch metadata when attempting send to topic

2017-04-30 Thread Dmitry Minkovsky
bably should have been doing this anyway, and probably with an executor service, but that's another story :). Hooray! On Sun, Apr 30, 2017 at 7:37 PM, Dmitry Minkovsky wrote: > I am attempting to send messages to two topics with a newly created > producer. > > The first messa

Re: Kafka Producer fails to get metadata on first send attempt

2017-04-30 Thread Dmitry Minkovsky
I just posted a new message with all of this distilled into a simple test and log output. On Sat, Apr 29, 2017 at 6:53 PM, Dmitry Minkovsky wrote: > It appears—at least according to debug logs—that the metadata request is > sent after the metadata update times out: > > > [

Producer does not fetch metadata when attempting send to topic

2017-04-30 Thread Dmitry Minkovsky
I am attempting to send messages to two topics with a newly created producer. The first message sends fine, but for some reason, the producer does not fetch metadata for the second topic before attempting to send. So sending to the second topic fails. The producer fetches metadata for the second t

Re: Kafka Producer fails to get metadata on first send attempt

2017-04-29 Thread Dmitry Minkovsky
, replicas = [0], isr = [0]), Partition(topic = join-requests, partition = 0, leader = 0, replicas = [0], isr = [0])]) After this, sending with the producer to `join-requests` works. On Sat, Apr 29, 2017 at 6:26 PM, Dmitry Minkovsky wrote: > I have a producer that fails to get metadata when

Kafka Producer fails to get metadata on first send attempt

2017-04-29 Thread Dmitry Minkovsky
I have a producer that fails to get metadata when it first attempts to send a record to a certain topic. It fails on ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs); [0] and yields: org.apache.kafka.common.errors.TimeoutExceptio

Re: Kafka Streams Application does not start after 10.1 to 10.2 update if topics need to be auto-created

2017-04-14 Thread Dmitry Minkovsky
o manually create all > >> input/output topics before you start your Streams application. > >> > >> For more details, see > >> > >> http://docs.confluent.io/current/streams/developer- > guide.html#managing-topics-of-a-kafka-streams-application > >

Kafka Streams Application does not start after 10.1 to 10.2 update if topics need to be auto-created

2017-04-11 Thread Dmitry Minkovsky
I updated from 10.1 and 10.2. I updated both the broker and maven dependency. I am using topic auto-create. With 10.1, starting the application with a broker would sometimes result in an error like: > Exception in thread "StreamThread-1" org.apache.kafka.streams.errors.TopologyBuilderException: I

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-17 Thread Dmitry Minkovsky
really love to be able to do ordered prefixed range scans with interactive queries. But if you don't think the lack of this facility is a defect then I can't spend more time on this. Thank you! On Fri, Mar 17, 2017 at 1:18 PM, Dmitry Minkovsky wrote: > Ah! Yes. Thank you! Th

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-17 Thread Dmitry Minkovsky
; > > About the range: I did double check this, and I guess my last answer > > was > > > > not correct, and range() should return ordered data, but I got a > follow > > > > up question: what the key type and serializer you use? Internally, > data > >

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-17 Thread Dmitry Minkovsky
arator` -- thus, if the serialized bytes > > don't reflect the order of the deserialized data, it returned range > > shows up unordered to you. > > > > > > -Matthias > > > > > > > > > > On 3/16/17 10:14 AM, Dmitry Minkovsky wrote

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-16 Thread Dmitry Minkovsky
ed list/iterator back. You could replace RocksDB with a > custom store though. > > > -Matthias > > > On 3/13/17 3:56 PM, Dmitry Minkovsky wrote: > > I am using interactive streams to query tables: > > > > ReadOnlyKeyValueStore > Messages.UserLette

Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-13 Thread Dmitry Minkovsky
I am using interactive streams to query tables: ReadOnlyKeyValueStore store = streams.store("view-user-drafts", QueryableStoreTypes.keyValueStore()); Documentation says that #range() should not return null values. However, for keys that have been tombstoned, it does retu

Re: KTable send old values API

2017-02-22 Thread Dmitry Minkovsky
at 3:05 AM, Michael Noll wrote: > Dmitry, > > I think your use case is similar to the one I described in the link below > (discussion in the kafka-dev mailing list): > http://search-hadoop.com/m/uyzND1rVOQ12OJ84U&subj=Re+Streams+TTLCacheStore > > Could you take a quick l

Re: KTable send old values API

2017-02-21 Thread Dmitry Minkovsky
ct API you'd like? Perhaps if others find > it useful too we/you can do a KIP. > > Thanks > Eno > > > On 21 Feb 2017, at 22:01, Dmitry Minkovsky wrote: > > > > At KAFKA-2984: ktable sends old values when required > > <https://github.com/apache/kafk

KTable send old values API

2017-02-21 Thread Dmitry Minkovsky
At KAFKA-2984: ktable sends old values when required , @ymatsuda writes: > NOTE: This is meant to be used by aggregation. But, if there is a use case like a SQL database trigger, we can add a new KTable method to expose this. Looking throu

Re: Shutting down a Streams job

2017-02-09 Thread Dmitry Minkovsky
partitions > get reassigned and state moved about. In fact, it is likely to fail at > some point, as local state that can be stored in a multitude of nodes may > not be able to be stored locally as the number of nodes becomes smaller. > > On Wed, Feb 8, 2017 at 12:34 PM, Dmitry Minkovs

Re: Kafka Streams: How to best maintain changelog indices using the DSL?

2017-02-08 Thread Dmitry Minkovsky
lized views, from flat changelog feeds. As the changelog entities change or tombstone, so do the views. The 1-to-many case I still have to play more with. I will update here if I discover anything good. Thank you. On Wed, Feb 8, 2017 at 4:14 PM, Dmitry Minkovsky wrote: > > And before we

Re: Kafka Streams: How to best maintain changelog indices using the DSL?

2017-02-08 Thread Dmitry Minkovsky
w_key would still be a > primary key) > > > Hope this makes sense. It's a little hard to explain in an email. > > > -Matthias > > On 2/8/17 10:22 AM, Dmitry Minkovsky wrote: > > I have a changelog that I'd like to index by

Re: Shutting down a Streams job

2017-02-08 Thread Dmitry Minkovsky
Can you take them down sequentially? Like, say, with a Kubernetes StatefulSet . On Wed, Feb 8, 2017 at 2:15 PM, Elias Levy wrote: > What are folks doing to cleanly shutdown a Streams job compr

Kafka Streams: How to best maintain changelog indices using the DSL?

2017-02-08 Thread Dmitry Minkovsky
I have a changelog that I'd like to index by some other key. So, something like this: class Item { byte[] id; String name; } KStreamBuilder topology = new KStreamBuilder(); KTable items = topology .table("items-changelog", "items"

Re: Kafka Streams: Is automatic repartitioning before joins public/stable API?

2017-02-08 Thread Dmitry Minkovsky
leased the next weeks), that explains this, too. > > See https://issues.apache.org/jira/browse/KAFKA-3561 > > > -Matthias > > On 2/7/17 7:06 PM, Dmitry Minkovsky wrote: > > I accidentally stumbled upon `repartitionRequired` and > `repartitionForJoin` > > in `KSt

Kafka Streams: Is automatic repartitioning before joins public/stable API?

2017-02-07 Thread Dmitry Minkovsky
I accidentally stumbled upon `repartitionRequired` and `repartitionForJoin` in `KStreamImpl`, which are examined/called before KStream join operations to determine whether a repartition is needed. The javadoc to `repartitionForJoin` explains the functionality: > Repartition a stream. This is

Re: What are wall clock-driven alternatives to punctuate()?

2017-01-21 Thread Dmitry Minkovsky
I am thinking the answer will be along the lines of, say, http://stackoverflow.com/a/17563065/741970, but maybe I'm missing something :). On Sat, Jan 21, 2017 at 10:30 PM, Dmitry Minkovsky wrote: > I understand that punctuate() is data driven. That is really cool. > > However, I

What are wall clock-driven alternatives to punctuate()?

2017-01-21 Thread Dmitry Minkovsky
I understand that punctuate() is data driven. That is really cool. However, I have a process in my application that needs to be wall clock driven. It is basically a delayed queue on local state accumulated during stream processing, and needs to run periodically (or, even better, continuously). Bef