Re: compaction not happening for change log topic

2017-03-31 Thread Matthias J. Sax
That's a topic config you need to set at the broker side: See config parameter `log.cleaner.*` in http://kafka.apache.org/documentation/#brokerconfigs -Matthias On 3/31/17 11:49 AM, Sachin Mittal wrote: > Hi, > I have noticed that many times change log topics don't get compacted. The > segment

Re: Difference between with and without state store cleanup streams startup

2017-03-31 Thread Matthias J. Sax
1. The whole log will be read. 2. It will read all the key-value pairs. However, the store will contain only the latest record for each key, after state recovery finished. Both both (1) and (2): note, that changelog topics are compacted, thus, it will not read everything since you started your ap

Re: Getting java.lang.UnsatisfiedLinkError: Cannot load module /tmp/librocksdbjni858257496864179953.so

2017-03-31 Thread Matthias J. Sax
Cross-posted twice (including an answer): https://github.com/facebook/rocksdb/issues/2071 http://stackoverflow.com/questions/43140522/exception-in-thread-streamthread-1-java-lang-unsatisfiedlinkerror-cannot-load > I don't understand why i need to re-build this. I downloaded the binaries > which

Re: how to listen each partition with separate threads ?

2017-03-31 Thread Matthias J. Sax
: > Hi, > > Is there any performance downside of creating so many consumers ? > > I mean literally I am gonna create atleast 7k connections in that case , I > have nearly 7k partitions with a given topic. > > > Keep learning keep moving . > > On Fri, Mar 31,

Re: how to listen each partition with separate threads ?

2017-03-31 Thread Matthias J. Sax
You need to create a KafkaConsumer per thread. -Matthias On 3/30/17 10:51 PM, Laxmi Narayan wrote: > Hi , > > I was thinking to listen each partition with separate thread in Kafka. > But i get error saying : > > > > > *org.apache.kafka.clients.consumer.KafkaConsumer@383ad023KafkaConsumer is

Re: Can multiple Kafka consumers read from the same partition of same topic by default?

2017-03-30 Thread Matthias J. Sax
ition right? > > Thanks! > > On Thu, Mar 30, 2017 at 9:00 PM, Matthias J. Sax > wrote: > >> Yes, you can do that. >> >> -Matthias >> >> >> >> On 3/30/17 6:09 PM, kant kodali wrote: >>> Hi All, >>> >>> Can mul

Re: Can multiple Kafka consumers read from the same partition of same topic by default?

2017-03-30 Thread Matthias J. Sax
Yes, you can do that. -Matthias On 3/30/17 6:09 PM, kant kodali wrote: > Hi All, > > Can multiple Kafka consumers read from the same partition of same topic by > default? By default, I mean since group.id is not mandatory I am wondering > if I spawn multiple kafka consumers without specifying

Re: Kafka streams - Large time windows ?

2017-03-30 Thread Matthias J. Sax
I don't see any problem with this. You might want to increase window retention time though. It's configures for each window individually (default is 1 day IIRC). You set this via `.until()` when you define a window in your code. -Matthias On 3/30/17 2:27 PM, Walid Lezzar wrote: > Hi, > I have

Re: Custom stream processor not triggering #punctuate()

2017-03-30 Thread Matthias J. Sax
We plan to do a KIP for this. Should come up soon. Please follow dev list for details and participate in the discussion! -Matthias On 3/30/17 11:02 AM, Thomas Becker wrote: > Does this fix the problem though? The docs indicate that new data is > required for each *partition*, not topic. Overall

Re: Understanding ReadOnlyWindowStore.fetch

2017-03-29 Thread Matthias J. Sax
It's based in "stream time", ie, the internally tracked progress based on the timestamps return by TimestampExtractor. -Matthias On 3/29/17 12:52 PM, Jon Yeargers wrote: > So '.until()' is based on clock time / elapsed time (IE record age) / > something else? > > The fact that Im seeing lots of

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-28 Thread Matthias J. Sax
ide the id should be > sufficient. We can simply document in the JavaDocs how Subtopology and > TaskMetadata can be linked to each other. I updated KIP-120 to include one for field for this. -Matthias On 3/27/17 4:27 PM, Matthias J. Sax wrote: > Hi, > > I would like to trigger t

Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-28 Thread Matthias J. Sax
gestion about function name of > `assignedPartitions`, to `topicPartitions` to be consistent with > `StreamsMetadata`? > > > Guozhang > > On Thu, Mar 23, 2017 at 4:30 PM, Matthias J. Sax > mailto:matth...@confluent.io>> wrote: > &

Re: more uniform task assignment across kafka stream nodes

2017-03-28 Thread Matthias J. Sax
won’t be the case due to task assignment unfortunately. I may > end up with say 5-6 nodes with aggregation assigned to them and 4-5 nodes > sitting there doing nothing. So it is a problem. > > Ara. > > On Mar 27, 2017, at 4:15 PM, Matthias J. Sax > mailto

Re: WindowStore and retention

2017-03-28 Thread Matthias J. Sax
Note, it's not based on system/wall-clock time, but based on "stream time", ie, the internal time progress of your app, that depends on the timestamps returned by TimestampExtractor. -Matthias On 3/28/17 10:55 AM, Matthias J. Sax wrote: > Yes. :) > > On 3/28/17 10:4

Re: WindowStore and retention

2017-03-28 Thread Matthias J. Sax
Yes. :) On 3/28/17 10:40 AM, Jon Yeargers wrote: > How long does a given value persist in a WindowStore? Does it obey the > '.until()' param of a windowed aggregation/ reduction? > > Please say yes. > signature.asc Description: OpenPGP digital signature

Re: 3 machines 12 threads all fail within 2 hours of starting the streams application

2017-03-28 Thread Matthias J. Sax
should not change while restoring > or Expiring 1 record(s) for changelog > or org.rocksdb.RocksDBException: ~ > > Lets hope with the PR https://github.com/apache/kafka/pull/2719 much of > such errors are resolved. > > Thanks > Sachin > > > > On Tue, Mar

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-27 Thread Matthias J. Sax
er` in the KIP. I also want do point out, that the VOTE thread was already started. So if you like the current KIP, please cast your vote there. Thanks a lot! -Matthias On 3/23/17 3:38 PM, Matthias J. Sax wrote: > Jay, > > about the naming schema: > >>>1. "kstr

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Matthias J. Sax
ve clusters of 10s or 100s of > nodes. We do need to make sure this processing is as efficient as possible. > The session window bug was killing us. Much better with the fix Damian > provided! > > Ara. > > On Mar 27, 2017, a

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Matthias J. Sax
on realized it’s too much work :) I looked > at default PartitionAssigner code too, but that ain’t trivial either. > > So I’m a bit hopeless :( > > Ara. > > On Mar 27, 2017, at 1:35 PM, Matthias J. Sax > mailto:matth...@confluent.io>> wrote: > > &g

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Matthias J. Sax
s: > StreamsTask taskId: 0_5 > ProcessorTopology: > KSTREAM-SOURCE-00: > topics: [activities-avro-or] > children: [KSTREAM-FILTER-01] > KSTREAM-FILTER-01: > children: [KSTREAM-MAP-02] > KSTREAM-MAP-02: > children: [KSTREAM-BRANCH-

Re: 3 machines 12 threads all fail within 2 hours of starting the streams application

2017-03-27 Thread Matthias J. Sax
Sachin, about this statement: >> Also note that when an identical streams application with single thread on >> a single instance is pulling data from some other non partitioned identical >> topic, the application never fails. What about some "combinations": - single threaded multiple instances

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-27 Thread Matthias J. Sax
Store.iterator()? That preserves the ability to call > remove() when it's appropriate and moves the refused bequest to when > you shouldn't. > > On Thu, 2017-03-23 at 11:05 -0700, Matthias J. Sax wrote: >> There is a difference between .delete() and it.remove(). >&

Re: more uniform task assignment across kafka stream nodes

2017-03-25 Thread Matthias J. Sax
"topic3”); > > These are different kinds of topics consuming different avro objects. > > Ara. > > On Mar 25, 2017, at 6:04 PM, Matthias J. Sax > mailto:matth...@confluent.io>> wrote: > > > > > > > This mes

Re: more uniform task assignment across kafka stream nodes

2017-03-25 Thread Matthias J. Sax
pplication > instance. Because we have some code dependencies between these 3 source > topics we can’t separate them into 3 applications at this time. Hence the > reason I want to get the task assignment algorithm basically do a uniform and > simple task assignment PER source topic. &g

Re: more uniform task assignment across kafka stream nodes

2017-03-25 Thread Matthias J. Sax
Hi, I am wondering why this happens in the first place. Streams, load-balanced over all running instances, and each instance should be the same number of tasks (and thus partitions) assigned. What is the overall assignment? Do you have StandyBy tasks configured? What version do you use? -Matthi

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-23 Thread Matthias J. Sax
t;>> opinion whether or not having "topology" in the names would help to >>> communicate this separation as well as combination of (1) and (2) to make >>> your app work as expected. >>> >>> If we stick with `KafkaStreams` for (2) *and* don't l

Re: Kafka Streams and reliable state stores

2017-03-23 Thread Matthias J. Sax
> There's also race conditions here -- what if node B owns partition 1, > node A redirects a query from a key in that partition, then B fails over to A > concurrently? You will get an exception, and you need to refresh your metadata. Afterward, you need to query again. This blog posts gives more

Re: APPLICATION_SERVER_CONFIG ?

2017-03-23 Thread Matthias J. Sax
The config does not "do" anything. It's metadata that get's broadcasted to other Streams instances for IQ feature. See this blog post for more details: https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ Happy to answer any follow up question. -Matt

Re: Error in running PageViewTypedDemo

2017-03-23 Thread Matthias J. Sax
I guess, the console producer inserts the data as String -- and not as "binary JSON". try so us a different serializer to insert data with expected format for Streams. -Matthias On 3/22/17 2:50 PM, Shanthi Nellaiappan wrote: > Thanks for the info. > With "page2",{"user":"2", "page":"22", "timest

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-23 Thread Matthias J. Sax
t; >> Just mentioning this because, when reading the thread quickly, I missed the >> "iterator" part and thought removal/deletion on the store wasn't working. >> ;-) >> >> Best, >> Michael >> >> >> >> >> On Wed, Mar

Re: kafka streams in-memory Keyvalue store iterator remove broken on upgrade to 0.10.2.0 from 0.10.1.1

2017-03-22 Thread Matthias J. Sax
Hi, remove() should not be supported -- thus, it's actually a bug in 0.10.1 that got fixed in 0.10.2. Stores should only be altered by Streams and iterator over the stores should be read-only -- otherwise, you might mess up Streams internal state. I would highly recommend to reconsider the call

Re: Question about kafka-streams task load balancing

2017-03-21 Thread Matthias J. Sax
Hi, I guess, it's currently not possible to load balance between different machines. It might be a nice optimization to add into Streams though. Right now, you should reduce the number of threads. Load balancing is based on threads, and thus, if Streams place tasks to all threads of one machine,

Re: Processing multiple topics

2017-03-20 Thread Matthias J. Sax
I would recommend to try out Kafka's Streams API instead of Spark Streaming. http://docs.confluent.io/current/streams/index.html -Matthias On 3/20/17 11:32 AM, Ali Akhtar wrote: > Are you saying, that it should process all messages from topic 1, then > topic 2, then topic 3, then 4? > > Or tha

Fwd: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-20 Thread Matthias J. Sax
\cc users list Forwarded Message Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API Date: Mon, 20 Mar 2017 11:51:01 -0700 From: Matthias J. Sax Organization: Confluent Inc To: d...@kafka.apache.org I want to push this discussion further. Guozhang's arg

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-16 Thread Matthias J. Sax
he point > of scanning a range if the data comes in some random order? That being the > case, the number of possible use-case scenarios seem to become > significantly limited. > > > Thank you! > Dmitry > > On Tue, Mar 14, 2017 at 1:12 PM, Matthias J. Sax > wrote: >

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Matthias J. Sax
he following names: > > - KafkaStreams as the new name for the builder that creates the logical > plan, with e.g. `KafkaStreams.stream("intput-topic")` and > `KafkaStreams.table("input-topic")`. > - KafkaStreamsRunner as the new name for the executioner of the plan,

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-14 Thread Matthias J. Sax
> However, >> for keys that have been tombstoned, it does return null for me. Sound like a bug. Can you reliable reproduce this? Would you mind opening a JIRA? Can you check if this happens for both cases: caching enabled and disabled? Or only for once case? > "No ordering guarantees are provid

Re: Trying to use Kafka Stream

2017-03-14 Thread Matthias J. Sax
ts.to(Serdes.String(), Serdes.Long(), "wordCount-output"); > > KafkaStreams streams = new KafkaStreams(builder, props); > streams.start(); > > // usually the stream application would be running forever, > // in this example we just let it run for some time and stop

Re: WordCount example does not output to OUTPUT topic

2017-03-14 Thread Matthias J. Sax
This seems to be the same question as "Trying to use Kafka Stream" ? On 3/14/17 9:05 AM, Mina Aslani wrote: > Hi, > I am using below code to read from a topic and count words and write to > another topic. The example is the one in github. > My kafka container is in the VM. I do not get any error

Re: Trying to use Kafka Stream

2017-03-13 Thread Matthias J. Sax
Maybe you need to reset your application using the reset tool: http://docs.confluent.io/current/streams/developer-guide.html#application-reset-tool Also keep in mind, that KTables buffer internally, and thus, you might only see data on commit. Try to reduce commit interval or disable caching by s

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Matthias J. Sax
that this does not build a DSL :) to contract against KafkaStreamsBuilder. -Matthias On 3/13/17 12:46 PM, Steven Schlansker wrote: > >> On Mar 13, 2017, at 12:30 PM, Matthias J. Sax wrote: >> >> Jay, >> >> thanks for your feedback >> >>> What if i

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Matthias J. Sax
e? I think currently we have pretty in-depth docs on our apis but I >suspect a person trying to figure out how to implement a simple callback >might get a bit lost trying to figure out how to wire it up. A simple five >line example in the docs would probably help a lot. Not s

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-11 Thread Matthias J. Sax
>> >>> +1 to the points 1,2,3,4 you mentioned. >>> >>> Naming is always a tricky subject, but renaming KStreamBuilder >>> to StreamsTopologyBuilder looks ok to me (I would have had a slight >>> preference towards DslTopologyBuilder, but hey.) The m

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-08 Thread Matthias J. Sax
e share them so we can come up with an holistic sound design (instead of uncoordinated local improvements that might diverge) Looking forward to your feedback on this KIP and the other API issues. -Matthias On 2/15/17 7:36 PM, Mathieu Fenniak wrote: > On Wed, Feb 15, 2017 at 5:04 PM, Matth

Re: Fixing two critical bugs in kafka streams

2017-03-07 Thread Matthias J. Sax
eline may go into deadlock state if some other thread has > already got the handle of that partition. So as per me we may need some > upper bound check for backoffTimeMs . > > Thanks > Sachin > > > > On Tue, Mar 7, 2017 at 3:24 AM, Matthias J. Sax > wrote: >

Re: Can I create user defined stream processing function/api?

2017-03-06 Thread Matthias J. Sax
Hi, you can implements custom operator via process(), transform(), and transform() values. Also, if you want to have even more control over the topology, you can use low-level Processor API directly instead of DSL. http://docs.confluent.io/current/streams/developer-guide.html#processor-api -Ma

Re: Fixing two critical bugs in kafka streams

2017-03-06 Thread Matthias J. Sax
;> Now when it tries to process the partition two it tries to get the >>> lock >>>>> to >>>>>> rocks db. It won't get the lock since that partition is now moved >> to >>>> some >>>>>> other thread. So

Re: Kafka streams questions

2017-03-06 Thread Matthias J. Sax
.age.ms", 1) > > .. > > KafkaStreams streams = new KafkaStreams(blah, props) > > > Thanks, > > > Neil > > > From: Matthias J. Sax > Sent: 28 February 2017 22:26:39 > To: users@kafka.apache.org > Subjec

Re: Fixing two critical bugs in kafka streams

2017-03-05 Thread Matthias J. Sax
Sachin, thanks a lot for contributing! Right now, I am not sure if I understand the change. On CommitFailedException, why can we just resume the thread? To me, it seems that the thread will be in an invalid state and thus it's not save to just swallow the exception and keep going. Can you shed so

Re: when will the messsages be sent to broker

2017-03-01 Thread Matthias J. Sax
> Thanks. > > Yuanjia Li > > From: Matthias J. Sax > Date: 2017-03-02 01:42 > To: users > Subject: Re: when will the messsages be sent to broker > There is also linger.ms parameter that is an upper bound how long a (not > yet filled) buffer is hold before sending it

Re: Subscribe to user mailing list

2017-03-01 Thread Matthias J. Sax
It's self service. See: http://kafka.apache.org/contact -Matthias On 3/1/17 8:48 AM, Mina Aslani wrote: > Hi, > > I would like to subscribe to user mailing list. > > Best regards, > Mina > signature.asc Description: OpenPGP digital signature

Re: Kafka Streams - ordering grouped messages

2017-03-01 Thread Matthias J. Sax
Just wanted to add, that there is always the potential about late arriving records, and thus, ordering by timestamp will never be perfect... You should rather try to design you application in a way such that it can handle out-of-order data gracefully and try to avoid the necessity of ordering reco

Re: when will the messsages be sent to broker

2017-03-01 Thread Matthias J. Sax
There is also linger.ms parameter that is an upper bound how long a (not yet filled) buffer is hold before sending it even if it's not full. Furthermore, you can do sync writes and block until producer received all acks. But it might have a performance penalty. http://docs.confluent.io/current/cl

Re: Kafka Streams vs Spark Streaming

2017-03-01 Thread Matthias J. Sax
Steven, I guess my last answer was not completely correct. You might start with a new store, if the task gets moved to a different machine. Otherwise, we don't explicitly wipe out the store, but just reuse it in whatever state it is on restart. -Matthias On 2/28/17 2:19 PM, Matthias J

Re: Kafka streams questions

2017-02-28 Thread Matthias J. Sax
Adding partitions: You should not add partitions at runtime -- it might break the semantics of your application because is might "mess up" you hash partitioning. Cf. https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoscaleaStreamsapp,i.e.,increasenumberofinputpartitions? If you are s

Re: Kafka Streams vs Spark Streaming

2017-02-28 Thread Matthias J. Sax
r am I > misunderstanding? > >> On Feb 28, 2017, at 9:12 AM, Matthias J. Sax wrote: >> >> If a store is backed by a changelog topic, the changelog topic is >> responsible to hold the latest state of the store. Thus, the topic must >> store the latest value per key

Re: Wait a few seconds before initializing state stores, so others don't have to wait before joining.

2017-02-28 Thread Matthias J. Sax
; > > 2017-02-28 18:15 GMT+01:00 Matthias J. Sax : > >> Hi Nicolas, >> >> an optimization like this would make a lot of sense. We did have some >> discussions around this already. However, it's more tricky to do than it >> seems at a first glace. We

Re: Kafka Streams vs Spark Streaming

2017-02-28 Thread Matthias J. Sax
Tainji, Streams provides at-least-once processing guarantees. Thus, all flush/commits must be aligned -- otherwise, this guarantee might break. -Matthias On 2/28/17 6:40 AM, Damian Guy wrote: > Hi Tainji, > > The changelogs are flushed on the commit interval. It isn't currently > possible to

Re: Wait a few seconds before initializing state stores, so others don't have to wait before joining.

2017-02-28 Thread Matthias J. Sax
Hi Nicolas, an optimization like this would make a lot of sense. We did have some discussions around this already. However, it's more tricky to do than it seems at a first glace. We hope to introduce something like this for the next release. -Matthias On 2/28/17 9:10 AM, Nicolas Fouché wrote:

Re: Kafka Streams vs Spark Streaming

2017-02-28 Thread Matthias J. Sax
If a store is backed by a changelog topic, the changelog topic is responsible to hold the latest state of the store. Thus, the topic must store the latest value per key. For this, we use a compacted topic. If case of restore, the local RocksDB store is cleared so it is empty, and we read the compl

Re: Lock Exception - Failed to lock the state directory

2017-02-26 Thread Matthias J. Sax
In case you use 0.10.0.2 please have a look into this FAQ https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Igetalockingexceptionsimilarto"Causedby:java.io.IOException:Failedtolockthestatedirectory:/tmp/kafka-streams//0_0".HowcanIresolvethis? However, if possible I would recommend to upgr

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Matthias J. Sax
First, I want to mention that you do no see "duplicate" -- you see late updates. Kafka Streams embraces "change" and there is no such thing as a final aggregate, but each agg output record is an update/refinement of the result. Strict filtering of "late updates" is hard in Kafka Streams If you wa

Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-23 Thread Matthias J. Sax
There is a voting thread on dev list. Please put your vote there. Thx. -Matthias On 2/23/17 8:15 PM, Mahendra Kariya wrote: > +1 for such a tool. It would be of great help in a lot of use cases. > > On Thu, Feb 23, 2017 at 11:44 PM, Matthias J. Sax > wrote: >

Fwd: Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-23 Thread Matthias J. Sax
\cc from dev Forwarded Message Subject: Re: KIP-122: Add a tool to Reset Consumer Group Offsets Date: Thu, 23 Feb 2017 10:13:39 -0800 From: Matthias J. Sax Organization: Confluent Inc To: d...@kafka.apache.org So you suggest to merge "scope options" --topics, --

Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-20 Thread Matthias J. Sax
Hi, thanks for updating the KIP. Couple of follow up comments: * Nit: Why is "Reset to Earliest" and "Reset to Latest" a "reset by time" option -- IMHO it belongs to "reset by position"? * Nit: Description of "Reset to Earliest" > using Kafka Consumer's `auto.offset.reset` to `earliest` I thi

Re: Storm kafka integration

2017-02-19 Thread Matthias J. Sax
You should ask Storm people. Kafka Spout is not provided by Kafka community. Or maybe try out Kafka's Streams API (couldn't resist... ;) ) -Matthias On 2/19/17 11:49 AM, pradeep s wrote: > Hi, > I am using Storm 1.0.2 and Kafka 0.10.1.1 and have query on Spout code to > integrate with Kafka. As

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-15 Thread Matthias J. Sax
is, please let us know. -Matthias On 2/14/17 9:59 AM, Matthias J. Sax wrote: > You can already output any number of record within .transform() using > the provided Context object from init()... > > > -Matthias > > On 2/14/17 9:16 AM, Guozhang Wang wrote: >>>

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Matthias J. Sax
You can already output any number of record within .transform() using the provided Context object from init()... -Matthias On 2/14/17 9:16 AM, Guozhang Wang wrote: >> and you can't output multiple records or branching logic from a > transform(); > > For output multiple records in transform, we

Re: Kafka streams: Getting a state store associated to a processor

2017-02-13 Thread Matthias J. Sax
Can you try this out with 0.10.2 branch or current trunk? We put some fixed like you suggested already. Would be nice to get feedback if those fixed resolve the issue for you. Some more comments inline. -Matthias On 2/13/17 12:27 PM, Adam Warski wrote: > Following this answer, I checked that th

Re: KTable and cleanup.policy=compact

2017-02-13 Thread Matthias J. Sax
ppl are brought closer together.. there is >> peace in the valley.. for me... ) >> ... >> >> KafkaStreams = new KafkaStream(KStreamBuilder, >> config_with_cleanup_policy_or_not?) >> KafkaStream.start >> >> On Wed, Feb 8, 2017 at 12:30 PM, Eno Th

Re: [VOTE] 0.10.2.0 RC1

2017-02-10 Thread Matthias J. Sax
Hi Ian, thanks for reporting this. I had a look at the stack trace and code and the whole situation is quite confusing. The exception itself is expected but we have a try-catch-block that should swallow the exception and it should never bubble up: In AbstractTaskCreator.retryWithBackoff a call

Re: Table a KStream

2017-02-10 Thread Matthias J. Sax
anuary 2017 at 07:46, Nick DeCoursin > wrote: > >> Thank you very much, both suggestions are wonderful, and I will try them. >> Have a great day! >> >> Kind regards, >> Nick >> >> On 24 January 2017 at 19:46, Matthias J. Sax >&g

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-10 Thread Matthias J. Sax
verride >>>>public Integer partition(String key, ChatMessage value, int >>>> numPartitions) { >>>>return partition0(null, value, numPartitions); >>>>} >>>> >>>>@VisibleForTesting >>>>int partition0

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-09 Thread Matthias J. Sax
It's by design. The reason it, that Streams uses a single producer to write to different output topic. As different output topics might have different key and/or value types, the producer is instantiated with byte[] as key and value type, and Streams serialized the data before handing it to the pr

Re: Kafka Streams: How to best maintain changelog indices using the DSL?

2017-02-08 Thread Matthias J. Sax
whereas for non-unique fields each index key may > have a list of entities it maps to. For non-unique fields where an index > key may map to thousands of entities, it is not practical maintaining them > in a single aggregation. > > Any further guidance would be greatly appreciate

Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-08 Thread Matthias J. Sax
+1 On 2/8/17 4:51 PM, Gwen Shapira wrote: > +1 (binding) > > On Wed, Feb 8, 2017 at 4:45 PM, Steven Schlansker > wrote: >> Hi everyone, >> >> Thank you for constructive feedback on KIP-121, >> KStream.peek(ForeachAction) ; >> it seems like it is time to call a vote which I hope will pass easily

Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-08 Thread Matthias J. Sax
I am not sure about --reset-plus and --reset-minus Would this skip n messages forward/backward for each partitions? -Matthias On 2/8/17 2:23 PM, Jorge Esteban Quilcate Otoya wrote: > Great. I think I got the idea. What about this options: > > Scenarios: > > 1. Current status > > ´kafka-consu

Re: Kafka Streams: How to best maintain changelog indices using the DSL?

2017-02-08 Thread Matthias J. Sax
It's difficult problem. And before we discuss deeper, a follow up question: if you map from to new_key, is this mapping "unique", or could it be that two different k/v-pairs map to the same new_key? If there are overlaps, you end up with a different problem as if there are no overlaps, because y

Re: KTable and cleanup.policy=compact

2017-02-08 Thread Matthias J. Sax
pics will grow larger than >> necessary. >> >> Eno >> >>> On 8 Feb 2017, at 18:56, Jon Yeargers wrote: >>> >>> What are the ramifications of failing to do this? >>> >>> On Tue, Feb 7, 2017 at 9:16 PM, Matthias J. Sax >>> w

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-08 Thread Matthias J. Sax
; Damian >>>>> >>>>> On Tue, 7 Feb 2017 at 09:30 Eno Thereska >> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I like the proposal, thank you. I have found it frustrating myself >> not to >>>>>&g

Re: Kafka Streams: Is automatic repartitioning before joins public/stable API?

2017-02-07 Thread Matthias J. Sax
Yes, you can rely on this. The feature was introduced in Kafka 0.10.1 and will stay like this. We already updated the JavaDocs (for upcoming 0.10.2, that is going to be released the next weeks), that explains this, too. See https://issues.apache.org/jira/browse/KAFKA-3561 -Matthias On 2/7/17 7

Re: KTable and cleanup.policy=compact

2017-02-07 Thread Matthias J. Sax
Yes, that is correct. -Matthias On 2/7/17 6:39 PM, Mathieu Fenniak wrote: > Hey kafka users, > > Is it correct that a Kafka topic that is used for a KTable should be set to > cleanup.policy=compact? > > I've never noticed until today that the KStreamBuilder#table() > documentation says: "Howe

Re: KIP-121 [Discuss]: Add KStream peek method

2017-02-06 Thread Matthias J. Sax
Steven, Thanks for your KIP. I move this discussion to dev mailing list -- KIPs need to be discussed there (and can be cc'ed to user list). Can you also add the KIP to the table "KIPs under discussion": https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovemen

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Matthias J. Sax
erge" belong in two different levels of the hierarchy. They both > transform two (or more) streams into one. > > Gwen > > On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax wrote: >> Hi All, >> >> I did prepare a KIP to do some cleanup some of Kafka's

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Matthias J. Sax
in >> my KS app to run once & output a graphviz document with my entire topology >> for debugging and analysis purposes; I use these methods to >> create ProcessorTopology instances to inspect the topology and create this >> output. I don't really see any alternativ

Re: Kafka Streams delivery semantics and state store

2017-02-06 Thread Matthias J. Sax
me for messages (a) and (b). It adds however extra > complexity - we need to maintain the map over time by deleting entries > older than committed offset. > > What do you think Matthias? > > Kind Regards > Krzysztof Lesniewski > > On 03.02.2017 20:02, Matthias J. S

Fwd: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-04 Thread Matthias J. Sax
cc'ed from dev Forwarded Message Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API Date: Sat, 4 Feb 2017 11:30:46 -0800 From: Matthias J. Sax Organization: Confluent Inc To: d...@kafka.apache.org I think the right pattern should be to use TopologyBuilder

[DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-03 Thread Matthias J. Sax
Hi All, I did prepare a KIP to do some cleanup some of Kafka's Streaming API. Please have a look here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API Looking forward to your feedback! -Matthias signature.asc Description: OpenPGP digital signat

Re: Kafka Streams delivery semantics and state store

2017-02-03 Thread Matthias J. Sax
-least-once") > > Nevertheless, in my use case such loss in rare circumstances is > acceptable and therefore extra complexity required to avoid it is > unnecessary. I will then go for the solution you have proposed with > storing >. I would appreciate though if you could verify

Re: Kafka Streams delivery semantics and state store

2017-02-02 Thread Matthias J. Sax
rds would be to access committed offset and delete all >> entries before it, but I did not find an easy way to access the committed >> offset. >> >> Is my thinking correct here? How could I maintain such state store and are >> there other gotchas I should pay attention to

Re: Kafka Streams delivery semantics and state store

2017-02-02 Thread Matthias J. Sax
Hi, About message acks: writes will be acked, however async (not sync as you describe it). Only before an actual commit, KafkaProducer#flush is called and all not-yet received acks are collected (ie, blocking/sync) before the commit is done. About state guarantees: there are none -- state might b

Re: "End of Batch" event

2017-02-01 Thread Matthias J. Sax
t has to be so complex... Kafka can be configured > to delete items older than 24h in a topic. So if you want to get rid > of records that did not arrive in the last 24h, just configure the > topic accordingly? > > On Wed, Feb 1, 2017 at 2:37 PM, Matthias J. Sax wrote: >>

Re: Kafka Streams punctuate with slow-changing KTables

2017-02-01 Thread Matthias J. Sax
sumably my first pass processor can still output new > dimension entries to the topic that the table is backed by? Again for > "find or create". > > On 1 February 2017 at 19:21, Matthias J. Sax wrote: > >> Thanks! >> >> About your example: in upcoming 0.10.

Re: "End of Batch" event

2017-02-01 Thread Matthias J. Sax
SourceTask. > > Currently, I'm representing all CSVs records in one KStream (adding source > to each record). But I can represent them as separate KStreams if needed. > Are you suggesting windowing these KStreams with 24 hours window and then > merging them? > > > >

Re: kafka-streams ktable recovery after rebalance crash

2017-02-01 Thread Matthias J. Sax
Not sure why the locks on the state directory got not release (maybe because of the crash) -- what version do you use? We fixed a couple of bug with this regard lately -- maybe it's fixed in upcoming 0.10.2 For now, you might want to delete the whole state directory (either manually or by calling

Re: Kafka Streams punctuate with slow-changing KTables

2017-02-01 Thread Matthias J. Sax
> Until the decision is made regarding the timing would it be best to ignore > `punctuate` entirely and trigger everything message by message via > `process`? > > On 1 February 2017 at 17:43, Matthias J. Sax wrote: > >> One thing to add: >> >> There are pl

Re: Kafka Streams punctuate with slow-changing KTables

2017-02-01 Thread Matthias J. Sax
One thing to add: There are plans/ideas to change punctuate() semantics to "system time" instead of "stream time". Would this be helpful for your use case? -Matthias On 2/1/17 9:41 AM, Matthias J. Sax wrote: > Yes and no. > > It does not depend on the number of tu

Re: Kafka Streams punctuate with slow-changing KTables

2017-02-01 Thread Matthias J. Sax
Yes and no. It does not depend on the number of tuples but on the timestamps of the tuples. I would assume, that records in the high volume stream have timestamps that are only a few milliseconds from each other, while for the low volume KTable, record have timestamp differences that are much big

Kafka docs for current trunk

2017-01-31 Thread Matthias J. Sax
Hi, I want to collect feedback about the idea to publish docs for current trunk version of Apache Kafka. Currently, docs are only published for official release. Other projects also have docs for current SNAPSHOT version. So the question rises, if this would be helpful for Kafka community, too.

Re: "End of Batch" event

2017-01-31 Thread Matthias J. Sax
inside the SourceTask to get a > snapshot of what currently in K for a specific source S, then I can send > delete message for the missing items by subtracting latest items in the CSV > from the items of that source in K. > > Thanks, > > On Tue, Jan 31, 2017 at 1:54 PM, Matthi

<    5   6   7   8   9   10   11   12   >