That's a topic config you need to set at the broker side:
See config parameter `log.cleaner.*` in
http://kafka.apache.org/documentation/#brokerconfigs
-Matthias
On 3/31/17 11:49 AM, Sachin Mittal wrote:
> Hi,
> I have noticed that many times change log topics don't get compacted. The
> segment
1. The whole log will be read.
2. It will read all the key-value pairs. However, the store will contain
only the latest record for each key, after state recovery finished.
Both both (1) and (2): note, that changelog topics are compacted, thus,
it will not read everything since you started your ap
Cross-posted twice (including an answer):
https://github.com/facebook/rocksdb/issues/2071
http://stackoverflow.com/questions/43140522/exception-in-thread-streamthread-1-java-lang-unsatisfiedlinkerror-cannot-load
> I don't understand why i need to re-build this. I downloaded the binaries
> which
:
> Hi,
>
> Is there any performance downside of creating so many consumers ?
>
> I mean literally I am gonna create atleast 7k connections in that case , I
> have nearly 7k partitions with a given topic.
>
>
> Keep learning keep moving .
>
> On Fri, Mar 31,
You need to create a KafkaConsumer per thread.
-Matthias
On 3/30/17 10:51 PM, Laxmi Narayan wrote:
> Hi ,
>
> I was thinking to listen each partition with separate thread in Kafka.
> But i get error saying :
>
>
>
>
> *org.apache.kafka.clients.consumer.KafkaConsumer@383ad023KafkaConsumer is
ition right?
>
> Thanks!
>
> On Thu, Mar 30, 2017 at 9:00 PM, Matthias J. Sax
> wrote:
>
>> Yes, you can do that.
>>
>> -Matthias
>>
>>
>>
>> On 3/30/17 6:09 PM, kant kodali wrote:
>>> Hi All,
>>>
>>> Can mul
Yes, you can do that.
-Matthias
On 3/30/17 6:09 PM, kant kodali wrote:
> Hi All,
>
> Can multiple Kafka consumers read from the same partition of same topic by
> default? By default, I mean since group.id is not mandatory I am wondering
> if I spawn multiple kafka consumers without specifying
I don't see any problem with this.
You might want to increase window retention time though. It's configures
for each window individually (default is 1 day IIRC).
You set this via `.until()` when you define a window in your code.
-Matthias
On 3/30/17 2:27 PM, Walid Lezzar wrote:
> Hi,
> I have
We plan to do a KIP for this. Should come up soon.
Please follow dev list for details and participate in the discussion!
-Matthias
On 3/30/17 11:02 AM, Thomas Becker wrote:
> Does this fix the problem though? The docs indicate that new data is
> required for each *partition*, not topic. Overall
It's based in "stream time", ie, the internally tracked progress based
on the timestamps return by TimestampExtractor.
-Matthias
On 3/29/17 12:52 PM, Jon Yeargers wrote:
> So '.until()' is based on clock time / elapsed time (IE record age) /
> something else?
>
> The fact that Im seeing lots of
ide the id should be
> sufficient. We can simply document in the JavaDocs how Subtopology and
> TaskMetadata can be linked to each other.
I updated KIP-120 to include one for field for this.
-Matthias
On 3/27/17 4:27 PM, Matthias J. Sax wrote:
> Hi,
>
> I would like to trigger t
gestion about function name of
> `assignedPartitions`, to `topicPartitions` to be consistent with
> `StreamsMetadata`?
>
>
> Guozhang
>
> On Thu, Mar 23, 2017 at 4:30 PM, Matthias J. Sax
> mailto:matth...@confluent.io>> wrote:
>
&
won’t be the case due to task assignment unfortunately. I may
> end up with say 5-6 nodes with aggregation assigned to them and 4-5 nodes
> sitting there doing nothing. So it is a problem.
>
> Ara.
>
> On Mar 27, 2017, at 4:15 PM, Matthias J. Sax
> mailto
Note, it's not based on system/wall-clock time, but based on "stream
time", ie, the internal time progress of your app, that depends on the
timestamps returned by TimestampExtractor.
-Matthias
On 3/28/17 10:55 AM, Matthias J. Sax wrote:
> Yes. :)
>
> On 3/28/17 10:4
Yes. :)
On 3/28/17 10:40 AM, Jon Yeargers wrote:
> How long does a given value persist in a WindowStore? Does it obey the
> '.until()' param of a windowed aggregation/ reduction?
>
> Please say yes.
>
signature.asc
Description: OpenPGP digital signature
should not change while restoring
> or Expiring 1 record(s) for changelog
> or org.rocksdb.RocksDBException: ~
>
> Lets hope with the PR https://github.com/apache/kafka/pull/2719 much of
> such errors are resolved.
>
> Thanks
> Sachin
>
>
>
> On Tue, Mar
er` in the KIP.
I also want do point out, that the VOTE thread was already started. So
if you like the current KIP, please cast your vote there.
Thanks a lot!
-Matthias
On 3/23/17 3:38 PM, Matthias J. Sax wrote:
> Jay,
>
> about the naming schema:
>
>>>1. "kstr
ve clusters of 10s or 100s of
> nodes. We do need to make sure this processing is as efficient as possible.
> The session window bug was killing us. Much better with the fix Damian
> provided!
>
> Ara.
>
> On Mar 27, 2017, a
on realized it’s too much work :) I looked
> at default PartitionAssigner code too, but that ain’t trivial either.
>
> So I’m a bit hopeless :(
>
> Ara.
>
> On Mar 27, 2017, at 1:35 PM, Matthias J. Sax
> mailto:matth...@confluent.io>> wrote:
>
>
&g
s:
> StreamsTask taskId: 0_5
> ProcessorTopology:
> KSTREAM-SOURCE-00:
> topics: [activities-avro-or]
> children: [KSTREAM-FILTER-01]
> KSTREAM-FILTER-01:
> children: [KSTREAM-MAP-02]
> KSTREAM-MAP-02:
> children: [KSTREAM-BRANCH-
Sachin,
about this statement:
>> Also note that when an identical streams application with single thread on
>> a single instance is pulling data from some other non partitioned identical
>> topic, the application never fails.
What about some "combinations":
- single threaded multiple instances
Store.iterator()? That preserves the ability to call
> remove() when it's appropriate and moves the refused bequest to when
> you shouldn't.
>
> On Thu, 2017-03-23 at 11:05 -0700, Matthias J. Sax wrote:
>> There is a difference between .delete() and it.remove().
>&
"topic3”);
>
> These are different kinds of topics consuming different avro objects.
>
> Ara.
>
> On Mar 25, 2017, at 6:04 PM, Matthias J. Sax
> mailto:matth...@confluent.io>> wrote:
>
>
>
>
>
>
> This mes
pplication
> instance. Because we have some code dependencies between these 3 source
> topics we can’t separate them into 3 applications at this time. Hence the
> reason I want to get the task assignment algorithm basically do a uniform and
> simple task assignment PER source topic.
&g
Hi,
I am wondering why this happens in the first place. Streams,
load-balanced over all running instances, and each instance should be
the same number of tasks (and thus partitions) assigned.
What is the overall assignment? Do you have StandyBy tasks configured?
What version do you use?
-Matthi
t;>> opinion whether or not having "topology" in the names would help to
>>> communicate this separation as well as combination of (1) and (2) to make
>>> your app work as expected.
>>>
>>> If we stick with `KafkaStreams` for (2) *and* don't l
> There's also race conditions here -- what if node B owns partition 1,
> node A redirects a query from a key in that partition, then B fails over to A
> concurrently?
You will get an exception, and you need to refresh your metadata.
Afterward, you need to query again.
This blog posts gives more
The config does not "do" anything. It's metadata that get's broadcasted
to other Streams instances for IQ feature.
See this blog post for more details:
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
Happy to answer any follow up question.
-Matt
I guess, the console producer inserts the data as String -- and not as
"binary JSON". try so us a different serializer to insert data with
expected format for Streams.
-Matthias
On 3/22/17 2:50 PM, Shanthi Nellaiappan wrote:
> Thanks for the info.
> With "page2",{"user":"2", "page":"22", "timest
t;
>> Just mentioning this because, when reading the thread quickly, I missed the
>> "iterator" part and thought removal/deletion on the store wasn't working.
>> ;-)
>>
>> Best,
>> Michael
>>
>>
>>
>>
>> On Wed, Mar
Hi,
remove() should not be supported -- thus, it's actually a bug in 0.10.1
that got fixed in 0.10.2.
Stores should only be altered by Streams and iterator over the stores
should be read-only -- otherwise, you might mess up Streams internal state.
I would highly recommend to reconsider the call
Hi,
I guess, it's currently not possible to load balance between different
machines. It might be a nice optimization to add into Streams though.
Right now, you should reduce the number of threads. Load balancing is
based on threads, and thus, if Streams place tasks to all threads of one
machine,
I would recommend to try out Kafka's Streams API instead of Spark Streaming.
http://docs.confluent.io/current/streams/index.html
-Matthias
On 3/20/17 11:32 AM, Ali Akhtar wrote:
> Are you saying, that it should process all messages from topic 1, then
> topic 2, then topic 3, then 4?
>
> Or tha
\cc users list
Forwarded Message
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
Date: Mon, 20 Mar 2017 11:51:01 -0700
From: Matthias J. Sax
Organization: Confluent Inc
To: d...@kafka.apache.org
I want to push this discussion further.
Guozhang's arg
he point
> of scanning a range if the data comes in some random order? That being the
> case, the number of possible use-case scenarios seem to become
> significantly limited.
>
>
> Thank you!
> Dmitry
>
> On Tue, Mar 14, 2017 at 1:12 PM, Matthias J. Sax
> wrote:
>
he following names:
>
> - KafkaStreams as the new name for the builder that creates the logical
> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> `KafkaStreams.table("input-topic")`.
> - KafkaStreamsRunner as the new name for the executioner of the plan,
> However,
>> for keys that have been tombstoned, it does return null for me.
Sound like a bug. Can you reliable reproduce this? Would you mind
opening a JIRA?
Can you check if this happens for both cases: caching enabled and
disabled? Or only for once case?
> "No ordering guarantees are provid
ts.to(Serdes.String(), Serdes.Long(), "wordCount-output");
>
> KafkaStreams streams = new KafkaStreams(builder, props);
> streams.start();
>
> // usually the stream application would be running forever,
> // in this example we just let it run for some time and stop
This seems to be the same question as "Trying to use Kafka Stream" ?
On 3/14/17 9:05 AM, Mina Aslani wrote:
> Hi,
> I am using below code to read from a topic and count words and write to
> another topic. The example is the one in github.
> My kafka container is in the VM. I do not get any error
Maybe you need to reset your application using the reset tool:
http://docs.confluent.io/current/streams/developer-guide.html#application-reset-tool
Also keep in mind, that KTables buffer internally, and thus, you might
only see data on commit.
Try to reduce commit interval or disable caching by s
that this does not build a DSL :) to
contract against KafkaStreamsBuilder.
-Matthias
On 3/13/17 12:46 PM, Steven Schlansker wrote:
>
>> On Mar 13, 2017, at 12:30 PM, Matthias J. Sax wrote:
>>
>> Jay,
>>
>> thanks for your feedback
>>
>>> What if i
e? I think currently we have pretty in-depth docs on our apis but I
>suspect a person trying to figure out how to implement a simple callback
>might get a bit lost trying to figure out how to wire it up. A simple five
>line example in the docs would probably help a lot. Not s
>>
>>> +1 to the points 1,2,3,4 you mentioned.
>>>
>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>> preference towards DslTopologyBuilder, but hey.) The m
e share them so we can come up with
an holistic sound design (instead of uncoordinated local improvements
that might diverge)
Looking forward to your feedback on this KIP and the other API issues.
-Matthias
On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> On Wed, Feb 15, 2017 at 5:04 PM, Matth
eline may go into deadlock state if some other thread has
> already got the handle of that partition. So as per me we may need some
> upper bound check for backoffTimeMs .
>
> Thanks
> Sachin
>
>
>
> On Tue, Mar 7, 2017 at 3:24 AM, Matthias J. Sax
> wrote:
>
Hi,
you can implements custom operator via process(), transform(), and
transform() values.
Also, if you want to have even more control over the topology, you can
use low-level Processor API directly instead of DSL.
http://docs.confluent.io/current/streams/developer-guide.html#processor-api
-Ma
;> Now when it tries to process the partition two it tries to get the
>>> lock
>>>>> to
>>>>>> rocks db. It won't get the lock since that partition is now moved
>> to
>>>> some
>>>>>> other thread. So
.age.ms", 1)
>
> ..
>
> KafkaStreams streams = new KafkaStreams(blah, props)
>
>
> Thanks,
>
>
> Neil
>
>
> From: Matthias J. Sax
> Sent: 28 February 2017 22:26:39
> To: users@kafka.apache.org
> Subjec
Sachin,
thanks a lot for contributing!
Right now, I am not sure if I understand the change. On
CommitFailedException, why can we just resume the thread? To me, it
seems that the thread will be in an invalid state and thus it's not save
to just swallow the exception and keep going. Can you shed so
> Thanks.
>
> Yuanjia Li
>
> From: Matthias J. Sax
> Date: 2017-03-02 01:42
> To: users
> Subject: Re: when will the messsages be sent to broker
> There is also linger.ms parameter that is an upper bound how long a (not
> yet filled) buffer is hold before sending it
It's self service.
See: http://kafka.apache.org/contact
-Matthias
On 3/1/17 8:48 AM, Mina Aslani wrote:
> Hi,
>
> I would like to subscribe to user mailing list.
>
> Best regards,
> Mina
>
signature.asc
Description: OpenPGP digital signature
Just wanted to add, that there is always the potential about late
arriving records, and thus, ordering by timestamp will never be perfect...
You should rather try to design you application in a way such that it
can handle out-of-order data gracefully and try to avoid the necessity
of ordering reco
There is also linger.ms parameter that is an upper bound how long a (not
yet filled) buffer is hold before sending it even if it's not full.
Furthermore, you can do sync writes and block until producer received
all acks. But it might have a performance penalty.
http://docs.confluent.io/current/cl
Steven,
I guess my last answer was not completely correct. You might start with
a new store, if the task gets moved to a different machine. Otherwise,
we don't explicitly wipe out the store, but just reuse it in whatever
state it is on restart.
-Matthias
On 2/28/17 2:19 PM, Matthias J
Adding partitions:
You should not add partitions at runtime -- it might break the semantics
of your application because is might "mess up" you hash partitioning.
Cf.
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoscaleaStreamsapp,i.e.,increasenumberofinputpartitions?
If you are s
r am I
> misunderstanding?
>
>> On Feb 28, 2017, at 9:12 AM, Matthias J. Sax wrote:
>>
>> If a store is backed by a changelog topic, the changelog topic is
>> responsible to hold the latest state of the store. Thus, the topic must
>> store the latest value per key
;
>
> 2017-02-28 18:15 GMT+01:00 Matthias J. Sax :
>
>> Hi Nicolas,
>>
>> an optimization like this would make a lot of sense. We did have some
>> discussions around this already. However, it's more tricky to do than it
>> seems at a first glace. We
Tainji,
Streams provides at-least-once processing guarantees. Thus, all
flush/commits must be aligned -- otherwise, this guarantee might break.
-Matthias
On 2/28/17 6:40 AM, Damian Guy wrote:
> Hi Tainji,
>
> The changelogs are flushed on the commit interval. It isn't currently
> possible to
Hi Nicolas,
an optimization like this would make a lot of sense. We did have some
discussions around this already. However, it's more tricky to do than it
seems at a first glace. We hope to introduce something like this for the
next release.
-Matthias
On 2/28/17 9:10 AM, Nicolas Fouché wrote:
If a store is backed by a changelog topic, the changelog topic is
responsible to hold the latest state of the store. Thus, the topic must
store the latest value per key. For this, we use a compacted topic.
If case of restore, the local RocksDB store is cleared so it is empty,
and we read the compl
In case you use 0.10.0.2 please have a look into this FAQ
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Igetalockingexceptionsimilarto"Causedby:java.io.IOException:Failedtolockthestatedirectory:/tmp/kafka-streams//0_0".HowcanIresolvethis?
However, if possible I would recommend to upgr
First, I want to mention that you do no see "duplicate" -- you see late
updates. Kafka Streams embraces "change" and there is no such thing as a
final aggregate, but each agg output record is an update/refinement of
the result.
Strict filtering of "late updates" is hard in Kafka Streams
If you wa
There is a voting thread on dev list. Please put your vote there. Thx.
-Matthias
On 2/23/17 8:15 PM, Mahendra Kariya wrote:
> +1 for such a tool. It would be of great help in a lot of use cases.
>
> On Thu, Feb 23, 2017 at 11:44 PM, Matthias J. Sax
> wrote:
>
\cc from dev
Forwarded Message
Subject: Re: KIP-122: Add a tool to Reset Consumer Group Offsets
Date: Thu, 23 Feb 2017 10:13:39 -0800
From: Matthias J. Sax
Organization: Confluent Inc
To: d...@kafka.apache.org
So you suggest to merge "scope options" --topics, --
Hi,
thanks for updating the KIP. Couple of follow up comments:
* Nit: Why is "Reset to Earliest" and "Reset to Latest" a "reset by
time" option -- IMHO it belongs to "reset by position"?
* Nit: Description of "Reset to Earliest"
> using Kafka Consumer's `auto.offset.reset` to `earliest`
I thi
You should ask Storm people. Kafka Spout is not provided by Kafka community.
Or maybe try out Kafka's Streams API (couldn't resist... ;) )
-Matthias
On 2/19/17 11:49 AM, pradeep s wrote:
> Hi,
> I am using Storm 1.0.2 and Kafka 0.10.1.1 and have query on Spout code to
> integrate with Kafka. As
is, please let us know.
-Matthias
On 2/14/17 9:59 AM, Matthias J. Sax wrote:
> You can already output any number of record within .transform() using
> the provided Context object from init()...
>
>
> -Matthias
>
> On 2/14/17 9:16 AM, Guozhang Wang wrote:
>>>
You can already output any number of record within .transform() using
the provided Context object from init()...
-Matthias
On 2/14/17 9:16 AM, Guozhang Wang wrote:
>> and you can't output multiple records or branching logic from a
> transform();
>
> For output multiple records in transform, we
Can you try this out with 0.10.2 branch or current trunk?
We put some fixed like you suggested already. Would be nice to get
feedback if those fixed resolve the issue for you.
Some more comments inline.
-Matthias
On 2/13/17 12:27 PM, Adam Warski wrote:
> Following this answer, I checked that th
ppl are brought closer together.. there is
>> peace in the valley.. for me... )
>> ...
>>
>> KafkaStreams = new KafkaStream(KStreamBuilder,
>> config_with_cleanup_policy_or_not?)
>> KafkaStream.start
>>
>> On Wed, Feb 8, 2017 at 12:30 PM, Eno Th
Hi Ian,
thanks for reporting this. I had a look at the stack trace and code and
the whole situation is quite confusing. The exception itself is expected
but we have a try-catch-block that should swallow the exception and it
should never bubble up:
In
AbstractTaskCreator.retryWithBackoff
a call
anuary 2017 at 07:46, Nick DeCoursin
> wrote:
>
>> Thank you very much, both suggestions are wonderful, and I will try them.
>> Have a great day!
>>
>> Kind regards,
>> Nick
>>
>> On 24 January 2017 at 19:46, Matthias J. Sax
>&g
verride
>>>>public Integer partition(String key, ChatMessage value, int
>>>> numPartitions) {
>>>>return partition0(null, value, numPartitions);
>>>>}
>>>>
>>>>@VisibleForTesting
>>>>int partition0
It's by design.
The reason it, that Streams uses a single producer to write to different
output topic. As different output topics might have different key and/or
value types, the producer is instantiated with byte[] as key and value
type, and Streams serialized the data before handing it to the pr
whereas for non-unique fields each index key may
> have a list of entities it maps to. For non-unique fields where an index
> key may map to thousands of entities, it is not practical maintaining them
> in a single aggregation.
>
> Any further guidance would be greatly appreciate
+1
On 2/8/17 4:51 PM, Gwen Shapira wrote:
> +1 (binding)
>
> On Wed, Feb 8, 2017 at 4:45 PM, Steven Schlansker
> wrote:
>> Hi everyone,
>>
>> Thank you for constructive feedback on KIP-121,
>> KStream.peek(ForeachAction) ;
>> it seems like it is time to call a vote which I hope will pass easily
I am not sure about --reset-plus and --reset-minus
Would this skip n messages forward/backward for each partitions?
-Matthias
On 2/8/17 2:23 PM, Jorge Esteban Quilcate Otoya wrote:
> Great. I think I got the idea. What about this options:
>
> Scenarios:
>
> 1. Current status
>
> ´kafka-consu
It's difficult problem.
And before we discuss deeper, a follow up question: if you map from
to new_key, is this mapping "unique", or could it be that two
different k/v-pairs map to the same new_key?
If there are overlaps, you end up with a different problem as if there
are no overlaps, because y
pics will grow larger than
>> necessary.
>>
>> Eno
>>
>>> On 8 Feb 2017, at 18:56, Jon Yeargers wrote:
>>>
>>> What are the ramifications of failing to do this?
>>>
>>> On Tue, Feb 7, 2017 at 9:16 PM, Matthias J. Sax
>>> w
; Damian
>>>>>
>>>>> On Tue, 7 Feb 2017 at 09:30 Eno Thereska
>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I like the proposal, thank you. I have found it frustrating myself
>> not to
>>>>>&g
Yes, you can rely on this.
The feature was introduced in Kafka 0.10.1 and will stay like this. We
already updated the JavaDocs (for upcoming 0.10.2, that is going to be
released the next weeks), that explains this, too.
See https://issues.apache.org/jira/browse/KAFKA-3561
-Matthias
On 2/7/17 7
Yes, that is correct.
-Matthias
On 2/7/17 6:39 PM, Mathieu Fenniak wrote:
> Hey kafka users,
>
> Is it correct that a Kafka topic that is used for a KTable should be set to
> cleanup.policy=compact?
>
> I've never noticed until today that the KStreamBuilder#table()
> documentation says: "Howe
Steven,
Thanks for your KIP. I move this discussion to dev mailing list -- KIPs
need to be discussed there (and can be cc'ed to user list).
Can you also add the KIP to the table "KIPs under discussion":
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovemen
erge" belong in two different levels of the hierarchy. They both
> transform two (or more) streams into one.
>
> Gwen
>
> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax wrote:
>> Hi All,
>>
>> I did prepare a KIP to do some cleanup some of Kafka's
in
>> my KS app to run once & output a graphviz document with my entire topology
>> for debugging and analysis purposes; I use these methods to
>> create ProcessorTopology instances to inspect the topology and create this
>> output. I don't really see any alternativ
me for messages (a) and (b). It adds however extra
> complexity - we need to maintain the map over time by deleting entries
> older than committed offset.
>
> What do you think Matthias?
>
> Kind Regards
> Krzysztof Lesniewski
>
> On 03.02.2017 20:02, Matthias J. S
cc'ed from dev
Forwarded Message
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
Date: Sat, 4 Feb 2017 11:30:46 -0800
From: Matthias J. Sax
Organization: Confluent Inc
To: d...@kafka.apache.org
I think the right pattern should be to use TopologyBuilder
Hi All,
I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
Please have a look here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API
Looking forward to your feedback!
-Matthias
signature.asc
Description: OpenPGP digital signat
-least-once")
>
> Nevertheless, in my use case such loss in rare circumstances is
> acceptable and therefore extra complexity required to avoid it is
> unnecessary. I will then go for the solution you have proposed with
> storing >. I would appreciate though if you could verify
rds would be to access committed offset and delete all
>> entries before it, but I did not find an easy way to access the committed
>> offset.
>>
>> Is my thinking correct here? How could I maintain such state store and are
>> there other gotchas I should pay attention to
Hi,
About message acks: writes will be acked, however async (not sync as you
describe it). Only before an actual commit, KafkaProducer#flush is
called and all not-yet received acks are collected (ie, blocking/sync)
before the commit is done.
About state guarantees: there are none -- state might b
t has to be so complex... Kafka can be configured
> to delete items older than 24h in a topic. So if you want to get rid
> of records that did not arrive in the last 24h, just configure the
> topic accordingly?
>
> On Wed, Feb 1, 2017 at 2:37 PM, Matthias J. Sax wrote:
>>
sumably my first pass processor can still output new
> dimension entries to the topic that the table is backed by? Again for
> "find or create".
>
> On 1 February 2017 at 19:21, Matthias J. Sax wrote:
>
>> Thanks!
>>
>> About your example: in upcoming 0.10.
SourceTask.
>
> Currently, I'm representing all CSVs records in one KStream (adding source
> to each record). But I can represent them as separate KStreams if needed.
> Are you suggesting windowing these KStreams with 24 hours window and then
> merging them?
>
>
>
>
Not sure why the locks on the state directory got not release (maybe
because of the crash) -- what version do you use? We fixed a couple of
bug with this regard lately -- maybe it's fixed in upcoming 0.10.2
For now, you might want to delete the whole state directory (either
manually or by calling
> Until the decision is made regarding the timing would it be best to ignore
> `punctuate` entirely and trigger everything message by message via
> `process`?
>
> On 1 February 2017 at 17:43, Matthias J. Sax wrote:
>
>> One thing to add:
>>
>> There are pl
One thing to add:
There are plans/ideas to change punctuate() semantics to "system time"
instead of "stream time". Would this be helpful for your use case?
-Matthias
On 2/1/17 9:41 AM, Matthias J. Sax wrote:
> Yes and no.
>
> It does not depend on the number of tu
Yes and no.
It does not depend on the number of tuples but on the timestamps of the
tuples.
I would assume, that records in the high volume stream have timestamps
that are only a few milliseconds from each other, while for the low
volume KTable, record have timestamp differences that are much big
Hi,
I want to collect feedback about the idea to publish docs for current
trunk version of Apache Kafka.
Currently, docs are only published for official release. Other projects
also have docs for current SNAPSHOT version. So the question rises, if
this would be helpful for Kafka community, too.
inside the SourceTask to get a
> snapshot of what currently in K for a specific source S, then I can send
> delete message for the missing items by subtracting latest items in the CSV
> from the items of that source in K.
>
> Thanks,
>
> On Tue, Jan 31, 2017 at 1:54 PM, Matthi
901 - 1000 of 1199 matches
Mail list logo