Re: access for KIP

2018-06-20 Thread Guozhang Wang
Done. Cheers.


Guozhang


On Wed, Jun 20, 2018 at 5:18 PM, Lei Chen  wrote:

> Hi, there,
>
> I'd like to request permission to the kafka space in our ASF cwiki, to be
> able
> to create KIPs.
>
> Username: leyncl
> email: ley...@gmail.com
>
> Thanks!
>



-- 
-- Guozhang


Re: cwiki access for KIP

2018-06-20 Thread Matthias Wessendorf
sweet, thanks!

On Wed, Jun 20, 2018 at 11:02 PM, Matthias J. Sax 
wrote:

> Done.
>
> On 6/20/18 11:15 AM, Matthias Wessendorf wrote:
> > Hi,
> >
> > I'd like to request permission to the kafka space in our ASF cwiki, to be
> > able to create KIPs.
> >
> > Username: matzew
> > email: mat...@apache.org
> >
> > Thanks!
> > Matthias
> >
>
>


-- 
Matthias Wessendorf

github: https://github.com/matzew
twitter: http://twitter.com/mwessendorf


Re: access for KIP

2018-06-20 Thread Matthias J. Sax
You are added already.

On 6/20/18 5:18 PM, Lei Chen wrote:
> Hi, there,
> 
> I'd like to request permission to the kafka space in our ASF cwiki, to be able
> to create KIPs.
> 
> Username: leyncl
> email: ley...@gmail.com
> 
> Thanks!
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2018-06-20 Thread Matthias J. Sax
No worries. It's just good to know. It seems that some other people are
interested to drive this further. So we will just "reassign" it to them.

Thanks for letting us know.


-Matthias

On 6/20/18 2:51 PM, Jeyhun Karimov wrote:
> Hi Matthias, all,
> 
> Currently, I am not able to complete this KIP. Please accept my
> apologies for that. 
> 
> 
> Cheers,
> Jeyhun
> 
> On Mon, Jun 11, 2018 at 2:25 AM Matthias J. Sax  > wrote:
> 
> What is the status of this KIP?
> 
> -Matthias
> 
> 
> On 2/13/18 1:43 PM, Matthias J. Sax wrote:
> > Is there any update for this KIP?
> >
> >
> > -Matthias
> >
> > On 12/4/17 2:08 PM, Matthias J. Sax wrote:
> >> Jeyhun,
> >>
> >> thanks for updating the KIP.
> >>
> >> I am wondering if you intend to add a new class `Produced`? There is
> >> already `org.apache.kafka.streams.kstream.Produced`. So if we want to
> >> add a new class, it must have a different name -- or we might be
> able to
> >> merge both into one?
> >>
> >> Also, for the KStream overlaods of `through()` and `to()`, can
> you add
> >> the different behavior using different overloads? It's not clear from
> >> the KIP what the semantics are.
> >>
> >>
> >> -Matthias
> >>
> >> On 11/17/17 3:27 PM, Jeyhun Karimov wrote:
> >>> Hi,
> >>>
> >>> Thanks for your comments. I agree with Matthias partially.
> >>> I think we should relax some requirements related with to() and
> through()
> >>> methods.
> >>> IMHO, Produced class can cover (existing/to be created) topic
> information,
> >>> and which will ease our effort:
> >>>
> >>> KStream.to(Produced topicInfo)
> >>> KStream.through(Produced topicInfo)
> >>>
> >>> This will decrease the number of overloads but we will need to
> deprecate
> >>> the existing to() and through() methods, perhaps.
> >>> I updated the KIP accordingly.
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>>
> >>> On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax
> mailto:matth...@confluent.io>>
> >>> wrote:
> >>>
>  @Jan:
> 
>  The `Produced` class was introduced in 1.0 to specify key and valud
>  Serdes (and partitioner) if data is written into a topic.
> 
>  Old API:
> 
>  KStream#to("topic", keySerde, valueSerde);
> 
>  New API:
> 
>  KStream#to("topic", Produced.with(keySerde, valueSerde));
> 
> 
>  This allows to reduce the number of overloads for `to()` (and
>  `through()` that follows the same pattern) -- the second
> parameter is
>  used to cover all different variations of option parameters
> users can
>  specify, while we only have 2 overload for `to()` itself.
> 
>  What is still unclear to me it, what you mean by this topic prefix
>  thing? Either a user cares about the topic name and thus, must
> create
>  and manage it manually. Or the user does not care, and Streams
> create
>  it. How would this prefix idea fit in here?
> 
> 
> 
>  @Guozhang:
> 
>  My idea was to extend `Produced` with the hint we want to give for
>  creating internal topic and pass a optional `Produced`
> parameter. There
>  are multiple things we can do here:
> 
>  1) stream.through(null, Produced...).groupBy().aggregate()
>  -> just allow for `null` topic name indicating that Streams should
>  create an internal topic
> 
>  2) stream.through(Produced...).groupBy().aggregate()
>  -> add one overload taking an mandatory `Produced`
> 
>  We use `Serialized` to picky back the information
> 
>  3) stream.groupBy(Serialized...).aggregate()
>  and stream.groupByKey(Serialized...).aggregate()
>  -> we don't need new top level overloads
> 
> 
>  There are different trade-offs for those alternatives and maybe
> there
>  are other ways to change the API. It's just to push the
> discussion further.
> 
> 
>  -Matthias
> 
>  On 11/12/17 1:22 PM, Jan Filipiak wrote:
> > Hi Gouzhang,
> >
> > this felt like these questions are supposed to be answered by me.
> > I do not understand the first one. I don't understand why the user
> > shouldn't be able to specify a suffix for the topic name.
> >
> >  For the third question I am not 100% familiar if the Produced
> class
> > came to existence
> > at all. I remember proposing it somewhere in our redo DSL
> discussion that
> > I dropped out of later. Finally any call that does:
> >
> > 1. create the internal 

[Discuss] KIP-321: Add method to get TopicNameExtractor in TopologyDescription

2018-06-20 Thread Nishanth Pradeep
Hello Everyone,

I have created a new KIP to discuss extending TopologyDescription. You can
find it here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Add+method+to+get+TopicNameExtractor+in+TopologyDescription

Please provide any feedback that you might have.

Best,
Nishanth Pradeep


Re: [DISCUSS] KIP-318: Make Kafka Connect Source idempotent

2018-06-20 Thread Stephane Maarek
Hi Randall

Thanks for your feedback

1) user can override: yes they can but they most surely won't know about
it. I didn't know about this improvement until I got on twitter and
exchanged with Ismael. I would say most users don't even know Kafka connect
is running with acks=all. My understanding behind the philosophy of Kafka
connect was that users only worry about writing a connector and the
framework makes the whole ETL safe. In that regards I think it's important
to increase the level of safety by preventing network duplicates (I don't
think anyone is against not having duplicates) and at the same time
increasing performance by having more in flight requests while keeping
ordering guarantees (I don't think anyone is against that either). So the
behaviour changes but I don't see any drawbacks to it.

1*) I'm very much allergic to introducing more Configs, but if the
community desires we can control that behaviour explicitly with a new
config and default the behaviour to false. It would give the users an
easier opt in and eventually we'll flip the config default to true


2) is there an easy way to clearly detect if a broker is running a specific
version of the API. If so, I don't mind including an if statement for a
conditional worker configuration and that would solve backwards
compatibility?

Thanks
Stephane




On Wed., 20 Jun. 2018, 10:54 pm Randall Hauch,  wrote:

> Thanks for starting this conversation, Stephane. I have a few questions.
>
> The worker already accepts nearly all producer properties already, and all
> `producer.*` properties override any hard-coded properties defined in
> `Worker.java`. So isn't it currently possible for a user to define these
> properties in their worker configuration if they want?
>
> Second, wouldn't this change the default behavior for existing worker
> configurations that have not overridden these properties? IOW, we would
> need to address the migration path to ensure backward compatibility.
>
> Third, the KIP mentions but does not really address the problem of running
> workers against pre-1.0 Kafka clusters. That definitely is something that
> happens frequently, so what is the planned approach for addressing this
> compatibility concern?
>
> All of these factors are likely why this has not yet been addressed to
> date: it's already possible for users to enable this feature, but doing it
> by default has compatibility concerns.
>
> Thoughts?
>
> Best regards,
>
> Randall
>
>
> On Wed, Jun 20, 2018 at 1:17 AM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > KIP link:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 318%3A+Make+Kafka+Connect+Source+idempotent
> >
> >
> > By looking at the code, it seems Worker.java is where the magic happens,
> > but do we need to also operate changes to KafkaBasedLog.java (~line 241)
> ?
> >
> > Love to hear your thoughts!
> >
>


[jira] [Resolved] (KAFKA-7072) Kafka Streams may drop rocksb window segments before they expire

2018-06-20 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-7072.
--
Resolution: Fixed

> Kafka Streams may drop rocksb window segments before they expire
> 
>
> Key: KAFKA-7072
> URL: https://issues.apache.org/jira/browse/KAFKA-7072
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1, 1.0.0, 0.11.0.2, 1.1.0, 2.0.0, 1.0.1
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Minor
> Fix For: 2.1.0
>
>
> The current implementation of Segments used by Rocks Session and Time window 
> stores is in conflict with our current timestamp management model.
> The current segmentation approach allows configuration of a fixed number of 
> segments (let's say *4*) and a fixed retention time. We essentially divide up 
> the retention time into the available number of segments:
> {quote}{{<-|-|}}
>  {{   expiration date                 right now}}
>  {{          ---retention time/}}
>  {{          |  seg 0  |  seg 1  |  seg 2  |  seg 3  |}}
> {quote}
> Note that we keep one extra segment so that we can record new events, while 
> some events in seg 0 are actually expired (but we only drop whole segments, 
> so they just get to hang around.
> {quote}{{<-|-|}}
>  {{       expiration date                 right now}}
>  {{              ---retention time/}}
>  {{          |  seg 0  |  seg 1  |  seg 2  |  seg 3  |}}
> {quote}
> When it's time to provision segment 4, we know that segment 0 is completely 
> expired, so we drop it:
> {quote}{{<---|-|}}
>  {{             expiration date                 right now}}
>  {{                    ---retention time/}}
>  {{                    |  seg 1  |  seg 2  |  seg 3  |  seg 4  |}}
> {quote}
>  
> However, the current timestamp management model allows for records from the 
> future. Namely, because we define stream time as the minimum buffered 
> timestamp (nondecreasing), we can have a buffer like this: [ 5, 2, 6 ], and 
> our stream time will be 2, but we'll handle a record with timestamp 5 next. 
> referring to the example, this means we could wind up having to provision 
> segment 4 before segment 0 expires!
> Let's say "f" is our future event:
> {quote}{{<---|-|f}}
>  {{             expiration date                 right now}}
>  {{                    ---retention time/}}
>  {{                    |  seg 1  |  seg 2  |  seg 3  |  seg 4  |}}
> {quote}
> Should we drop segment 0 prematurely? Or should we crash and refuse to 
> process "f"?
> Today, we do the former, and this is probably the better choice. If we refuse 
> to process "f", then we cannot make progress ever again.
> Dropping segment 0 prematurely is a bummer, but users could also set the 
> retention time high enough that they don't think they'll actually get any 
> events late enough to need segment 0. Worst case, since we can have many 
> future events without advancing stream time, sparse enough to each require 
> their own segment, which would eat deeply into the retention time, dropping 
> many segments that should be live.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


access for KIP

2018-06-20 Thread Lei Chen
Hi, there,

I'd like to request permission to the kafka space in our ASF cwiki, to be able
to create KIPs.

Username: leyncl
email: ley...@gmail.com

Thanks!


Re: [DISCUSS] KIP-315: Stream Join Sticky Assignor

2018-06-20 Thread Guozhang Wang
Hi Mike,

Thanks for sharing your feedbacks and the blocker features for Kafka
Streams. They are very helpful.


Guozhang


On Wed, Jun 20, 2018 at 2:47 PM, Mike Freyberger 
wrote:

> Matthias,
>
> Thanks for the feedback. For our use case, we have some complexities that
> make using the existing Streams API more complicated than using the Kafka
> Consumer directly.
>
> - We are doing async processing, which I don't think is currently
> available (KIP-311 is handling this).
>
> - Our state has a high eviction rate, so kafka compacted topics are not
> ideal for storing the changelog. The compaction cannot keep up and the
> topic will be majority tombstones when it is read on partition
> reassignment. We are using a KV store the "change log" instead.
>
> - We wanted to separate consumer threads from worker threads to maximize
> parallelization while keeping consumer TCP connections down.
>
> Ultimately, it was much simpler to use the KafkaConsumer directly rather
> than peel away a lot of what Streams API does for you. I think we should
> continue to add support for more complex use cases and processing to the
> Streams API. However, I think there will remain streaming join use cases
> that can benefit from the flexibility that comes from using the
> KafkaConsumer directly.
>
> Mike
>
> On 6/20/18, 5:08 PM, "Matthias J. Sax"  wrote:
>
> Mike,
>
> thanks a lot for the KIP. I am wondering, why Streams API cannot be
> used
> for perform the join? Would be good to understand the advantage of
> adding a `StickyStreamJoinAssignor` compared to using Streams API? Atm,
> it seems to be a redundant feature to me.
>
> -Matthias
>
>
> On 6/20/18 1:07 PM, Mike Freyberger wrote:
> > Hi everybody,
> >
> > I’ve created a proposal document for KIP-315 which outlines the
> motivation of adding a new partition assignment strategy that can used for
> streaming join use cases.
> >
> > It’d be great to get feedback on the overall idea and the proposed
> implementation.
> >
> > KIP Link: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 315%3A+Stream+Join+Sticky+Assignor
> >
> > Thanks,
> >
> > Mike
> >
>
>
>
>


-- 
-- Guozhang


Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

2018-06-20 Thread Harsha
+1

-Harsha

On Wed, Jun 20, 2018, at 5:15 AM, Thomas Crayford wrote:
> +1 (non-binding)
> 
> On Tue, Jun 19, 2018 at 8:20 PM, Lucas Wang  wrote:
> 
> > Hi Jun, Ismael,
> >
> > Can you please take a look when you get a chance? Thanks!
> >
> > Lucas
> >
> > On Mon, Jun 18, 2018 at 1:47 PM, Ted Yu  wrote:
> >
> > > +1
> > >
> > > On Mon, Jun 18, 2018 at 1:04 PM, Lucas Wang 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I've addressed a couple of comments in the discussion thread for
> > KIP-291,
> > > > and
> > > > got no objections after making the changes. Therefore I would like to
> > > start
> > > > the voting thread.
> > > >
> > > > KIP:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 291%3A+Have+separate+queues+for+control+requests+and+data+requests
> > > >
> > > > Thanks for your time!
> > > > Lucas
> > > >
> > >
> >


Re: [DISCUSS] - KIP-314: KTable to GlobalKTable Bi-directional Join

2018-06-20 Thread Guozhang Wang
Hello Adam,

Thanks for proposing the KIP. A few meta comments:

1. As Matthias mentioned, the current GlobalKTable is designed to be
read-only, and not driving any computations (btw the global store backing a
GlobalKTable should also be read-only). Behind the scene the global store
updating task and the regular streams task are two separate ones running
two separate processor topologies by two threads: the global store updating
task's topology is simply a source node, plus a processor node (let's call
it the update-processor) that puts to the store. If we allow the
GlobalKTable to drive the join, then we need the underlying global store's
update processor to link to the downstream processors of the normal regular
task's topology in order to pass the joined results to downstream. It means
the two topologies will be merged, and that merged topology can only be
executed as a single task, by a single thread. We need to think of a way
how to work around this issue first of all before proceeding to next steps.

2. Not clear what do you mean by "In terms of data complexity, any pattern
that requires us to rekey the data once is equivalent in terms of data
capacity requirements.." do you mean that although we have a duplicated
state store: ModifiedEvents in addition to the original Events with only
the enhanced key, this is not avoidable anyways even if we do re-keying?
Note that in
https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable?preview=/74684836/74687529/Screenshot%20from%202017-11-18%2023%3A26%3A52.png
we were considering if it is still possible to only materialize the joining
tables once each still, i.e. not having a duplicated store. So I think it
is not necessarily the case that we have to duplicate the KTable's store.


One minor comment:

1. In `*KTable as Driver, joined on GlobalKTable join mechanism`* section,
I think we still need to join the old value with the global store to form a
pair of "" joined result, so that the resulting KTable can still
be applied in another aggregation operator that allows correct addition /
subtraction logic.

2. For KTable-KTable join, we have inner / left / outer, while for
KStream-KTable / GlobalKTable join we only have inner / left, and the
reason is that for stream-table joins outer join makes less sense; should
we consider outer for KTable-GlobalKTable join as well?


Guozhang


On Tue, Jun 19, 2018 at 10:27 AM, Adam Bellemare 
wrote:

> Matthias
>
> Thanks for the links. I have seen those before but I will dig deeper into
> them, especially around the CombinedKey and the flush + cache + rangescan
> functionality. I believe Jan had a PR with many of the changes in there,
> perhaps I can use some of the work that was done there to help leverage a
> similar (or identical) design.
>
> I will certainly be able to make a PoC before going to vote on this one. It
> is a larger change and I suspect that we will need to review some of the
> finer points to ensure that the design is still suitable and sufficiently
> performant. I'll post back when I have something more concrete, but in the
> meantime I welcome all other concerns and comments.
>
> Thanks
>
>
>
> On Mon, Jun 18, 2018 at 10:05 PM, Matthias J. Sax 
> wrote:
>
> > Adam,
> >
> > thanks a lot for the KIP. I agree that this would be a valuable feature
> > to add. It's a very complex one though. You correctly pointed out, that
> > the GlobalKTable (or global stores in general) cannot be the "driver"
> > atm and are passively updated only. This is by design. Are you familiar
> > with the KIP discussion of KIP-99?
> > (https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=67633649
> > )
> > Would be worth to refresh to understand the tradeoffs and design
> decisions.
> >
> > It's unclear to me, what the impact will be if we want to change the
> > current design. Even if no GlobalKTable is used, it might have impact on
> > performance and for sure on code complexity. Overall, it seems that a
> > POC might be required before we can consider adding this (with the
> > danger, that it does not get accepted in the end).
> >
> > Are you aware of KIP-213:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 213+Support+non-key+joining+in+KTable
> >
> > It suggest to add non-key joins and a lot of issues how to implement
> > this were discussed already. As a KTable-GloblKTable join is a non-key
> > join, too, it seems that those discussion apply to your KIP too.
> >
> > Hope this helps to make the next steps.
> >
> >
> > -Matthias
> >
> >
> > On 6/18/18 1:15 PM, Adam Bellemare wrote:
> > > Hi All
> > >
> > > I created KIP-314 and I would like to initiate a discussion on it.
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 314%3A+KTable+to+GlobalKTable+Bi-directional+Join
> > >
> > > The primary goal of this KIP is to improve the way that Kafka can deal
> > with
> > > relational data at scale. This KIP would alter the way that
> 

Re: [DISCUSS] KIP-319: Replace segments with segmentSize in WindowBytesStoreSupplier

2018-06-20 Thread Guozhang Wang
Hello John,

Thanks for the KIP.

Should we consider making the change on `Stores#persistentWindowStore`
parameters as well?


Guozhang


On Wed, Jun 20, 2018 at 1:31 PM, John Roesler  wrote:

> Hi Ted,
>
> Ah, when you made that comment to me before, I thought you meant as opposed
> to "segments". Now it makes sense that you meant as opposed to
> "segmentSize".
>
> I named it that way to match the peer method "windowSize", which is also a
> quantity of milliseconds.
>
> I agree that "interval" is more intuitive, but I think I favor consistency
> in this case. Does that seem reasonable?
>
> Thanks,
> -John
>
> On Wed, Jun 20, 2018 at 1:06 PM Ted Yu  wrote:
>
> > Normally size is not measured in time unit, such as milliseconds.
> > How about naming the new method segmentInterval ?
> > Thanks
> >  Original message From: John Roesler 
> > Date: 6/20/18  10:45 AM  (GMT-08:00) To: dev@kafka.apache.org Subject:
> > [DISCUSS] KIP-319: Replace segments with segmentSize in
> > WindowBytesStoreSupplier
> > Hello All,
> >
> > I'd like to propose KIP-319 to fix an issue I identified in KAFKA-7080.
> > Specifically, we're creating CachingWindowStore with the *number of
> > segments* instead of the *segment size*.
> >
> > Here's the jira: https://issues.apache.org/jira/browse/KAFKA-7080
> > Here's the KIP: https://cwiki.apache.org/confluence/x/mQU0BQ
> >
> > additionally, here's a draft PR for clarity:
> > https://github.com/apache/kafka/pull/5257
> >
> > Please let me know what you think!
> >
> > Thanks,
> > -John
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2018-06-20 Thread Jeyhun Karimov
Hi Matthias, all,

Currently, I am not able to complete this KIP. Please accept my apologies
for that.


Cheers,
Jeyhun

On Mon, Jun 11, 2018 at 2:25 AM Matthias J. Sax 
wrote:

> What is the status of this KIP?
>
> -Matthias
>
>
> On 2/13/18 1:43 PM, Matthias J. Sax wrote:
> > Is there any update for this KIP?
> >
> >
> > -Matthias
> >
> > On 12/4/17 2:08 PM, Matthias J. Sax wrote:
> >> Jeyhun,
> >>
> >> thanks for updating the KIP.
> >>
> >> I am wondering if you intend to add a new class `Produced`? There is
> >> already `org.apache.kafka.streams.kstream.Produced`. So if we want to
> >> add a new class, it must have a different name -- or we might be able to
> >> merge both into one?
> >>
> >> Also, for the KStream overlaods of `through()` and `to()`, can you add
> >> the different behavior using different overloads? It's not clear from
> >> the KIP what the semantics are.
> >>
> >>
> >> -Matthias
> >>
> >> On 11/17/17 3:27 PM, Jeyhun Karimov wrote:
> >>> Hi,
> >>>
> >>> Thanks for your comments. I agree with Matthias partially.
> >>> I think we should relax some requirements related with to() and
> through()
> >>> methods.
> >>> IMHO, Produced class can cover (existing/to be created) topic
> information,
> >>> and which will ease our effort:
> >>>
> >>> KStream.to(Produced topicInfo)
> >>> KStream.through(Produced topicInfo)
> >>>
> >>> This will decrease the number of overloads but we will need to
> deprecate
> >>> the existing to() and through() methods, perhaps.
> >>> I updated the KIP accordingly.
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>>
> >>> On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax <
> matth...@confluent.io>
> >>> wrote:
> >>>
>  @Jan:
> 
>  The `Produced` class was introduced in 1.0 to specify key and valud
>  Serdes (and partitioner) if data is written into a topic.
> 
>  Old API:
> 
>  KStream#to("topic", keySerde, valueSerde);
> 
>  New API:
> 
>  KStream#to("topic", Produced.with(keySerde, valueSerde));
> 
> 
>  This allows to reduce the number of overloads for `to()` (and
>  `through()` that follows the same pattern) -- the second parameter is
>  used to cover all different variations of option parameters users can
>  specify, while we only have 2 overload for `to()` itself.
> 
>  What is still unclear to me it, what you mean by this topic prefix
>  thing? Either a user cares about the topic name and thus, must create
>  and manage it manually. Or the user does not care, and Streams create
>  it. How would this prefix idea fit in here?
> 
> 
> 
>  @Guozhang:
> 
>  My idea was to extend `Produced` with the hint we want to give for
>  creating internal topic and pass a optional `Produced` parameter.
> There
>  are multiple things we can do here:
> 
>  1) stream.through(null, Produced...).groupBy().aggregate()
>  -> just allow for `null` topic name indicating that Streams should
>  create an internal topic
> 
>  2) stream.through(Produced...).groupBy().aggregate()
>  -> add one overload taking an mandatory `Produced`
> 
>  We use `Serialized` to picky back the information
> 
>  3) stream.groupBy(Serialized...).aggregate()
>  and stream.groupByKey(Serialized...).aggregate()
>  -> we don't need new top level overloads
> 
> 
>  There are different trade-offs for those alternatives and maybe there
>  are other ways to change the API. It's just to push the discussion
> further.
> 
> 
>  -Matthias
> 
>  On 11/12/17 1:22 PM, Jan Filipiak wrote:
> > Hi Gouzhang,
> >
> > this felt like these questions are supposed to be answered by me.
> > I do not understand the first one. I don't understand why the user
> > shouldn't be able to specify a suffix for the topic name.
> >
> >  For the third question I am not 100% familiar if the Produced class
> > came to existence
> > at all. I remember proposing it somewhere in our redo DSL discussion
> that
> > I dropped out of later. Finally any call that does:
> >
> > 1. create the internal topic
> > 2. register sink
> > 3. register source
> >
> > will always get the work done. If we have a Produced like class.
> putting
> > all the parameters
> > in there make sense. (Partitioner, serde, PartitionHint, internal,
> name
> > ... )
> >
> > Hope this helps?
> >
> >
> > On 10.11.2017 07:54, Guozhang Wang wrote:
> >> A few clarification questions on the proposal details.
> >>
> >> 1. API: although the repartition only happens at the final stateful
> >> operations like agg / join, the repartition flag info was actually
>  passed
> >> from an earlier operator like map / groupBy. So what should be the
> new
> >> API
> >> look like? For example, if we do
> >>
> >> stream.groupBy().through("topic-name", 

Re: [DISCUSS] KIP-315: Stream Join Sticky Assignor

2018-06-20 Thread Mike Freyberger
Matthias, 

Thanks for the feedback. For our use case, we have some complexities that make 
using the existing Streams API more complicated than using the Kafka Consumer 
directly. 

- We are doing async processing, which I don't think is currently available 
(KIP-311 is handling this). 

- Our state has a high eviction rate, so kafka compacted topics are not ideal 
for storing the changelog. The compaction cannot keep up and the topic will be 
majority tombstones when it is read on partition reassignment. We are using a 
KV store the "change log" instead.

- We wanted to separate consumer threads from worker threads to maximize 
parallelization while keeping consumer TCP connections down.

Ultimately, it was much simpler to use the KafkaConsumer directly rather than 
peel away a lot of what Streams API does for you. I think we should continue to 
add support for more complex use cases and processing to the Streams API. 
However, I think there will remain streaming join use cases that can benefit 
from the flexibility that comes from using the KafkaConsumer directly. 

Mike

On 6/20/18, 5:08 PM, "Matthias J. Sax"  wrote:

Mike,

thanks a lot for the KIP. I am wondering, why Streams API cannot be used
for perform the join? Would be good to understand the advantage of
adding a `StickyStreamJoinAssignor` compared to using Streams API? Atm,
it seems to be a redundant feature to me.

-Matthias


On 6/20/18 1:07 PM, Mike Freyberger wrote:
> Hi everybody,
> 
> I’ve created a proposal document for KIP-315 which outlines the 
motivation of adding a new partition assignment strategy that can used for 
streaming join use cases.
> 
> It’d be great to get feedback on the overall idea and the proposed 
implementation.
> 
> KIP Link: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-315%3A+Stream+Join+Sticky+Assignor
> 
> Thanks,
> 
> Mike
> 





Re: [DISCUSS] KIP-315: Stream Join Sticky Assignor

2018-06-20 Thread Matthias J. Sax
Mike,

thanks a lot for the KIP. I am wondering, why Streams API cannot be used
for perform the join? Would be good to understand the advantage of
adding a `StickyStreamJoinAssignor` compared to using Streams API? Atm,
it seems to be a redundant feature to me.

-Matthias


On 6/20/18 1:07 PM, Mike Freyberger wrote:
> Hi everybody,
> 
> I’ve created a proposal document for KIP-315 which outlines the motivation of 
> adding a new partition assignment strategy that can used for streaming join 
> use cases.
> 
> It’d be great to get feedback on the overall idea and the proposed 
> implementation.
> 
> KIP Link: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-315%3A+Stream+Join+Sticky+Assignor
> 
> Thanks,
> 
> Mike
> 



signature.asc
Description: OpenPGP digital signature


Re: cwiki access for KIP

2018-06-20 Thread Matthias J. Sax
Done.

On 6/20/18 11:15 AM, Matthias Wessendorf wrote:
> Hi,
> 
> I'd like to request permission to the kafka space in our ASF cwiki, to be
> able to create KIPs.
> 
> Username: matzew
> email: mat...@apache.org
> 
> Thanks!
> Matthias
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (KAFKA-7082) Concurrent createTopics calls may throw NodeExistsException

2018-06-20 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-7082:
--

 Summary: Concurrent createTopics calls may throw 
NodeExistsException
 Key: KAFKA-7082
 URL: https://issues.apache.org/jira/browse/KAFKA-7082
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Ismael Juma
Assignee: Ismael Juma
 Fix For: 2.0.1, 1.1.2


This exception is unexpected causing an `UnknownServerException` to be thrown 
back to the client. Example below:

{code}
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /config/topics/connect-configs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:472)
at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1400)
at kafka.zk.KafkaZkClient.create$1(KafkaZkClient.scala:262)
at 
kafka.zk.KafkaZkClient.setOrCreateEntityConfigs(KafkaZkClient.scala:269)
at 
kafka.zk.AdminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK(AdminZkClient.scala:99)
at kafka.server.AdminManager$$anonfun$2.apply(AdminManager.scala:126)
at kafka.server.AdminManager$$anonfun$2.apply(AdminManager.scala:81)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-319: Replace segments with segmentSize in WindowBytesStoreSupplier

2018-06-20 Thread John Roesler
Hi Ted,

Ah, when you made that comment to me before, I thought you meant as opposed
to "segments". Now it makes sense that you meant as opposed to
"segmentSize".

I named it that way to match the peer method "windowSize", which is also a
quantity of milliseconds.

I agree that "interval" is more intuitive, but I think I favor consistency
in this case. Does that seem reasonable?

Thanks,
-John

On Wed, Jun 20, 2018 at 1:06 PM Ted Yu  wrote:

> Normally size is not measured in time unit, such as milliseconds.
> How about naming the new method segmentInterval ?
> Thanks
>  Original message From: John Roesler 
> Date: 6/20/18  10:45 AM  (GMT-08:00) To: dev@kafka.apache.org Subject:
> [DISCUSS] KIP-319: Replace segments with segmentSize in
> WindowBytesStoreSupplier
> Hello All,
>
> I'd like to propose KIP-319 to fix an issue I identified in KAFKA-7080.
> Specifically, we're creating CachingWindowStore with the *number of
> segments* instead of the *segment size*.
>
> Here's the jira: https://issues.apache.org/jira/browse/KAFKA-7080
> Here's the KIP: https://cwiki.apache.org/confluence/x/mQU0BQ
>
> additionally, here's a draft PR for clarity:
> https://github.com/apache/kafka/pull/5257
>
> Please let me know what you think!
>
> Thanks,
> -John
>


[VOTE] 2.0.0 RC0

2018-06-20 Thread Rajini Sivaram
Hello Kafka users, developers and client-developers,


This is the first candidate for release of Apache Kafka 2.0.0.


This is a major version release of Apache Kafka. It includes 40 new  KIPs
and

several critical bug fixes. Please see the 2.0.0 release plan for more
details:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820


A few notable highlights:

   - Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for CreateTopics
   (KIP-277)
   - SASL/OAUTHBEARER implementation (KIP-255)
   - Improved quota communication and customization of quotas (KIP-219,
   KIP-257)
   - Efficient memory usage for down conversion (KIP-283)
   - Fix log divergence between leader and follower during fast leader
   failover (KIP-279)
   - Drop support for Java 7 and remove deprecated code including old scala
   clients
   - Connect REST extension plugin, support for externalizing secrets and
   improved error handling (KIP-285, KIP-297, KIP-298 etc.)
   - Scala API for Kafka Streams and other Streams API improvements
   (KIP-270, KIP-150, KIP-245, KIP-251 etc.)


Release notes for the 2.0.0 release:

http://home.apache.org/~rsivaram/kafka-2.0.0-rc0/RELEASE_NOTES.html


*** Please download, test and vote by Monday, June 25, 4pm PT


Kafka's KEYS file containing PGP keys we use to sign the release:

http://kafka.apache.org/KEYS


* Release artifacts to be voted upon (source and binary):

http://home.apache.org/~rsivaram/kafka-2.0.0-rc0/


* Maven artifacts to be voted upon:

https://repository.apache.org/content/groups/staging/


* Javadoc:

http://home.apache.org/~rsivaram/kafka-2.0.0-rc0/javadoc/


* Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:

https://github.com/apache/kafka/tree/2.0.0-rc0


* Documentation:

http://home.apache.org/~rsivaram/kafka-2.0.0-rc0/
kafka_2.11-2.0.0-site-docs.tgz

(Since documentation cannot go live until 2.0.0 is released, please
download and verify)


* Successful Jenkins builds for the 2.0 branch:

Unit/integration tests: https://builds.apache.org/job/kafka-2.0-jdk8/48/

System tests: https://jenkins.confluent.io/job/system-test-kafka/job/2.0/6/ (2
failures are known flaky tests)



Please test and verify the release artifacts and submit a vote for this RC
or report any issues so that we can fix them and roll out a new RC ASAP!

Although this release vote requires PMC votes to pass, testing, votes, and
bug
reports are valuable and appreciated from everyone.


Thanks,


Rajini


[DISCUSS] KIP-315: Stream Join Sticky Assignor

2018-06-20 Thread Mike Freyberger
Hi everybody,

I’ve created a proposal document for KIP-315 which outlines the motivation of 
adding a new partition assignment strategy that can used for streaming join use 
cases.

It’d be great to get feedback on the overall idea and the proposed 
implementation.

KIP Link: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-315%3A+Stream+Join+Sticky+Assignor

Thanks,

Mike


Jenkins build is back to normal : kafka-2.0-jdk8 #48

2018-06-20 Thread Apache Jenkins Server
See 




cwiki access for KIP

2018-06-20 Thread Matthias Wessendorf
Hi,

I'd like to request permission to the kafka space in our ASF cwiki, to be
able to create KIPs.

Username: matzew
email: mat...@apache.org

Thanks!
Matthias

-- 
Matthias Wessendorf

github: https://github.com/matzew
twitter: http://twitter.com/mwessendorf


Re: [DISCUSS] KIP-319: Replace segments with segmentSize in WindowBytesStoreSupplier

2018-06-20 Thread Ted Yu
Normally size is not measured in time unit, such as milliseconds. 
How about naming the new method segmentInterval ?
Thanks
 Original message From: John Roesler  Date: 
6/20/18  10:45 AM  (GMT-08:00) To: dev@kafka.apache.org Subject: [DISCUSS] 
KIP-319: Replace segments with segmentSize in WindowBytesStoreSupplier 
Hello All,

I'd like to propose KIP-319 to fix an issue I identified in KAFKA-7080.
Specifically, we're creating CachingWindowStore with the *number of
segments* instead of the *segment size*.

Here's the jira: https://issues.apache.org/jira/browse/KAFKA-7080
Here's the KIP: https://cwiki.apache.org/confluence/x/mQU0BQ

additionally, here's a draft PR for clarity:
https://github.com/apache/kafka/pull/5257

Please let me know what you think!

Thanks,
-John


[jira] [Created] (KAFKA-7081) Add describe all topics API to AdminClient

2018-06-20 Thread Manikumar (JIRA)
Manikumar created KAFKA-7081:


 Summary: Add describe all topics API to AdminClient
 Key: KAFKA-7081
 URL: https://issues.apache.org/jira/browse/KAFKA-7081
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Affects Versions: 2.0.0
Reporter: Manikumar
Assignee: Manikumar


Currently AdminClient supports describeTopics(Collection topicNames) 
method for topic
 descriptions and listTopics() for topic name listing.

To describe all topics, users currently use listTopics() to get all topic names 
and pass the name
 list to describeTopics. 

Since "describe all topics" is a common operation, We propose to add 
describeTopics() method to get all topic descriptions. This will be simple to 
use and avoids additional metadata requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] KIP-319: Replace segments with segmentSize in WindowBytesStoreSupplier

2018-06-20 Thread John Roesler
Hello All,

I'd like to propose KIP-319 to fix an issue I identified in KAFKA-7080.
Specifically, we're creating CachingWindowStore with the *number of
segments* instead of the *segment size*.

Here's the jira: https://issues.apache.org/jira/browse/KAFKA-7080
Here's the KIP: https://cwiki.apache.org/confluence/x/mQU0BQ

additionally, here's a draft PR for clarity:
https://github.com/apache/kafka/pull/5257

Please let me know what you think!

Thanks,
-John


Re: [DISCUSS] KIP-318: Replace segments with segmentSize in WindowBytesStoreSupplier

2018-06-20 Thread John Roesler
Oops! It looks like 318 was taken. I'll re-send this message to a new
thread.

On Wed, Jun 20, 2018 at 12:40 PM John Roesler  wrote:

> Hello All,
>
> I'd like to propose KIP-318 to fix an issue I identified in KAFKA-7080.
> Specifically, we're creating CachingWindowStore with the *number of
> segments* instead of the *segment size*.
>
> Here's the jira: https://issues.apache.org/jira/browse/KAFKA-7080
> Here's the KIP: https://cwiki.apache.org/confluence/x/mQU0BQ
>
> additionally, here's a draft PR for clarity:
> https://github.com/apache/kafka/pull/5257
>
> Please let me know what you think!
>
> Thanks,
> -John
>


[DISCUSS] KIP-318: Replace segments with segmentSize in WindowBytesStoreSupplier

2018-06-20 Thread John Roesler
Hello All,

I'd like to propose KIP-318 to fix an issue I identified in KAFKA-7080.
Specifically, we're creating CachingWindowStore with the *number of
segments* instead of the *segment size*.

Here's the jira: https://issues.apache.org/jira/browse/KAFKA-7080
Here's the KIP: https://cwiki.apache.org/confluence/x/mQU0BQ

additionally, here's a draft PR for clarity:
https://github.com/apache/kafka/pull/5257

Please let me know what you think!

Thanks,
-John


[jira] [Resolved] (KAFKA-6546) Add ENDPOINT_NOT_FOUND_ON_LEADER error code for missing listener

2018-06-20 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-6546.
---
   Resolution: Fixed
 Reviewer: Ismael Juma
Fix Version/s: (was: 2.1.0)
   2.0.0

> Add ENDPOINT_NOT_FOUND_ON_LEADER error code for missing listener
> 
>
> Key: KAFKA-6546
> URL: https://issues.apache.org/jira/browse/KAFKA-6546
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.0.0
>
>
> In 1,1, if an endpoint is available on the broker processing a metadata 
> request, but the corresponding listener is not available on the leader of a 
> partition, LEADER_NOT_AVAILABLE is returned (earlier versions returned 
> UNKNOWN_SERVER_ERROR). This could indicate broker misconfiguration where some 
> brokers are not configured with all listeners or it could indicate a 
> transient error when listeners are dynamically added, We want to treat the 
> error as a transient error to process dynamic updates, but we should notify 
> clients of the actual error. This change should be made when MetadataRequest 
> version is updated so that LEADER_NOT_AVAILABLE is returned to older clients.
> See 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration]
>  and  [https://github.com/apache/kafka/pull/4539] for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7080) WindowStoreBuilder incorrectly initializes CachingWindowStore

2018-06-20 Thread John Roesler (JIRA)
John Roesler created KAFKA-7080:
---

 Summary: WindowStoreBuilder incorrectly initializes 
CachingWindowStore
 Key: KAFKA-7080
 URL: https://issues.apache.org/jira/browse/KAFKA-7080
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.0.1, 1.1.0, 1.0.0, 2.0.0
Reporter: John Roesler
Assignee: John Roesler
 Fix For: 2.1.0


When caching is enabled on the WindowStoreBuilder, it creates a 
CachingWindowStore. However, it incorrectly passes storeSupplier.segments() 
(the number of segments) to the segmentInterval argument.

 

The impact is low, since any valid number of segments is also a valid segment 
size, but it likely results in much smaller segments than intended. For 
example, the segments may be sized 3ms instead of 60,000ms.

 

Ideally the WindowBytesStoreSupplier interface would allow suppliers to 
advertise their segment size instead of segment count. I plan to create a KIP 
to propose this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-316: Command-line overrides for ConnectDistributed worker properties

2018-06-20 Thread Randall Hauch
My only concern with this proposal is that it adds yet another way to
specify configuration properties, making it a bit more difficult to track
down how/whether a configuration property has been set. Configuring Kafka
Connect is already too challenging, so we need to be very clear that this
additional complexity is worth the price. IMO the KIP should explicitly
address this.

Also, any reason why this KIP singles out the Connect distributed worker
and doesn't address the standalone worker?

And finally, the KIP does not clearly explain the command line behavior. It
simply says:

An additional command-line argument, --override key=value, will be
accepted for ConnectDistributed.

which makes it seem like only a single key-value pair can be specified.
Clearly this is not the intention, but is `--override` used once and
followed by multiple key-value pairs, or is `--override` required for every
key-value pair? Does it need to follow the property file reference, or can
the overrides precede or be interspersed around the property file
reference? Does this happen to exactly match the broker command line
behavior? The KIP should be very clear about the behavior and should fully
specify all of these details.

Best regards,

Randall


On Fri, Jun 15, 2018 at 11:14 AM, Jakub Scholz  wrote:

> I think this makes perfect sense. Thanks for opening this KIP.
>
> Thanks & Regards
> Jakub
>
> On Fri, Jun 15, 2018 at 2:10 AM Kevin Lafferty 
> wrote:
>
> > Hi all,
> >
> > I created KIP-316, and I would like to initiate discussion.
> >
> > The KIP is here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
> >
> > Thanks,
> > Kevin
> >
>


Re: [VOTE] KIP-316: Command-line overrides for ConnectDistributed worker properties

2018-06-20 Thread Randall Hauch
IMO we should not request a vote without additional time for discussion.

Best regards,

Randall

On Wed, Jun 20, 2018 at 7:29 AM, Jakub Scholz  wrote:

> +1 (non-binding)
>
> On Mon, Jun 18, 2018 at 8:42 PM Kevin Lafferty 
> wrote:
>
> > Hi all,
> >
> > I got a couple notes of interest on the discussion thread and no
> > objections, so I'd like to kick off a vote. This is a very small change.
> >
> > KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
> >
> > Jira: https://issues.apache.org/jira/browse/KAFKA-7060
> >
> > GitHub PR: https://github.com/apache/kafka/pull/5234
> >
> > -Kevin
> >
>


Re: [DISCUSS] KIP-318: Make Kafka Connect Source idempotent

2018-06-20 Thread Randall Hauch
Thanks for starting this conversation, Stephane. I have a few questions.

The worker already accepts nearly all producer properties already, and all
`producer.*` properties override any hard-coded properties defined in
`Worker.java`. So isn't it currently possible for a user to define these
properties in their worker configuration if they want?

Second, wouldn't this change the default behavior for existing worker
configurations that have not overridden these properties? IOW, we would
need to address the migration path to ensure backward compatibility.

Third, the KIP mentions but does not really address the problem of running
workers against pre-1.0 Kafka clusters. That definitely is something that
happens frequently, so what is the planned approach for addressing this
compatibility concern?

All of these factors are likely why this has not yet been addressed to
date: it's already possible for users to enable this feature, but doing it
by default has compatibility concerns.

Thoughts?

Best regards,

Randall


On Wed, Jun 20, 2018 at 1:17 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> KIP link:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 318%3A+Make+Kafka+Connect+Source+idempotent
>
>
> By looking at the code, it seems Worker.java is where the magic happens,
> but do we need to also operate changes to KafkaBasedLog.java (~line 241) ?
>
> Love to hear your thoughts!
>


Build failed in Jenkins: kafka-2.0-jdk8 #47

2018-06-20 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: Use exceptions in o.a.k.common if possible and deprecate ZkUtils

--
[...truncated 433.76 KB...]
kafka.zookeeper.ZooKeeperClientTest > 
testBlockOnRequestCompletionFromStateChangeHandler PASSED

kafka.zookeeper.ZooKeeperClientTest > testUnresolvableConnectString STARTED

kafka.zookeeper.ZooKeeperClientTest > testUnresolvableConnectString PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testPipelinedGetData STARTED

kafka.zookeeper.ZooKeeperClientTest > testPipelinedGetData PASSED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChildChangeHandlerForChildChange 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChildChangeHandlerForChildChange 
PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenExistingZNodeWithChildren 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetChildrenExistingZNodeWithChildren 
PASSED

kafka.zookeeper.ZooKeeperClientTest > testSetDataExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testSetDataExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testMixedPipeline STARTED

kafka.zookeeper.ZooKeeperClientTest > testMixedPipeline PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetDataExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetDataExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testDeleteExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testDeleteExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testSessionExpiry STARTED

kafka.zookeeper.ZooKeeperClientTest > testSessionExpiry PASSED

kafka.zookeeper.ZooKeeperClientTest > testSetDataNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testSetDataNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testDeleteNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testDeleteNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testExistsExistingZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testExistsExistingZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testZooKeeperStateChangeRateMetrics 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testZooKeeperStateChangeRateMetrics PASSED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDeletion STARTED

kafka.zookeeper.ZooKeeperClientTest > testZNodeChangeHandlerForDeletion PASSED

kafka.zookeeper.ZooKeeperClientTest > testGetAclNonExistentZNode STARTED

kafka.zookeeper.ZooKeeperClientTest > testGetAclNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testStateChangeHandlerForAuthFailure 
STARTED

kafka.zookeeper.ZooKeeperClientTest > testStateChangeHandlerForAuthFailure 
PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone STARTED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testConnectionId STARTED

kafka.network.SocketServerTest > testConnectionId PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > testNoOpAction STARTED

kafka.network.SocketServerTest > testNoOpAction PASSED

kafka.network.SocketServerTest > simpleRequest STARTED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > closingChannelException STARTED

kafka.network.SocketServerTest > closingChannelException PASSED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingInProgress STARTED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingInProgress PASSED

kafka.network.SocketServerTest > 

Jenkins build is back to normal : kafka-trunk-jdk8 #2754

2018-06-20 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-06-20 Thread Eno Thereska
Hi Lucas,

Sorry for the delay, just had a look at this. A couple of questions:
- did you notice any positive change after implementing this KIP? I'm
wondering if you have any experimental results that show the benefit of the
two queues.

- priority is usually not sufficient in addressing the problem the KIP
identifies. Even with priority queues, you will sometimes (often?) have the
case that data plane requests will be ahead of the control plane requests.
This happens because the system might have already started processing the
data plane requests before the control plane ones arrived. So it would be
good to know what % of the problem this KIP addresses.

Thanks
Eno

On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu  wrote:

> Change looks good.
>
> Thanks
>
> On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang  wrote:
>
> > Hi Ted,
> >
> > Thanks for the suggestion. I've updated the KIP. Please take another
> look.
> >
> > Lucas
> >
> > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu  wrote:
> >
> > > Currently in KafkaConfig.scala :
> > >
> > >   val QueuedMaxRequests = 500
> > >
> > > It would be good if you can include the default value for this new
> config
> > > in the KIP.
> > >
> > > Thanks
> > >
> > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang 
> > wrote:
> > >
> > > > Hi Ted, Dong
> > > >
> > > > I've updated the KIP by adding a new config, instead of reusing the
> > > > existing one.
> > > > Please take another look when you have time. Thanks a lot!
> > > >
> > > > Lucas
> > > >
> > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu  wrote:
> > > >
> > > > > bq.  that's a waste of resource if control request rate is low
> > > > >
> > > > > I don't know if control request rate can get to 100,000, likely
> not.
> > > Then
> > > > > using the same bound as that for data requests seems high.
> > > > >
> > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> lucasatu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > Thanks for taking a look at this KIP.
> > > > > > Let's say today the setting of "queued.max.requests" in cluster A
> > is
> > > > > 1000,
> > > > > > while the setting in cluster B is 100,000.
> > > > > > The 100 times difference might have indicated that machines in
> > > cluster
> > > > B
> > > > > > have larger memory.
> > > > > >
> > > > > > By reusing the "queued.max.requests", the controlRequestQueue in
> > > > cluster
> > > > > B
> > > > > > automatically
> > > > > > gets a 100x capacity without explicitly bothering the operators.
> > > > > > I understand the counter argument can be that maybe that's a
> waste
> > of
> > > > > > resource if control request
> > > > > > rate is low and operators may want to fine tune the capacity of
> the
> > > > > > controlRequestQueue.
> > > > > >
> > > > > > I'm ok with either approach, and can change it if you or anyone
> > else
> > > > > feels
> > > > > > strong about adding the extra config.
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu 
> > wrote:
> > > > > >
> > > > > > > Lucas:
> > > > > > > Under Rejected Alternatives, #2, can you elaborate a bit more
> on
> > > why
> > > > > the
> > > > > > > separate config has bigger impact ?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin  >
> > > > wrote:
> > > > > > >
> > > > > > > > Hey Luca,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Looks good overall. Some comments below:
> > > > > > > >
> > > > > > > > - We usually specify the full mbean for the new metrics in
> the
> > > KIP.
> > > > > Can
> > > > > > > you
> > > > > > > > specify it in the Public Interface section similar to KIP-237
> > > > > > > >  > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > ?
> > > > > > > >
> > > > > > > > - Maybe we could follow the same pattern as KIP-153
> > > > > > > >  > > > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> metric>,
> > > > > > > > where we keep the existing sensor name "BytesInPerSec" and
> add
> > a
> > > > new
> > > > > > > sensor
> > > > > > > > "ReplicationBytesInPerSec", rather than replacing the sensor
> > > name "
> > > > > > > > BytesInPerSec" with e.g. "ClientBytesInPerSec".
> > > > > > > >
> > > > > > > > - It seems that the KIP changes the semantics of the broker
> > > config
> > > > > > > > "queued.max.requests" because the number of total requests
> > queued
> > > > in
> > > > > > the
> > > > > > > > broker will be no longer bounded by "queued.max.requests".
> This
> > > > > > probably
> > > > > > > > needs to be specified in the Public Interfaces section for
> > > > > discussion.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas Wang <
> > > > lucasatu...@gmail.com>
> > > > 

Re: [VOTE] KIP-316: Command-line overrides for ConnectDistributed worker properties

2018-06-20 Thread Jakub Scholz
+1 (non-binding)

On Mon, Jun 18, 2018 at 8:42 PM Kevin Lafferty 
wrote:

> Hi all,
>
> I got a couple notes of interest on the discussion thread and no
> objections, so I'd like to kick off a vote. This is a very small change.
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
>
> Jira: https://issues.apache.org/jira/browse/KAFKA-7060
>
> GitHub PR: https://github.com/apache/kafka/pull/5234
>
> -Kevin
>


[jira] [Created] (KAFKA-7079) ValueTransformer#transform does not pass the key

2018-06-20 Thread Hashan Gayasri Udugahapattuwa (JIRA)
Hashan Gayasri Udugahapattuwa created KAFKA-7079:


 Summary: ValueTransformer#transform does not pass the key
 Key: KAFKA-7079
 URL: https://issues.apache.org/jira/browse/KAFKA-7079
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.1.0
 Environment: Fedora 27
Reporter: Hashan Gayasri Udugahapattuwa


ValueTransformers' transform method doesn't pass the key to user-code. 
Reporting this as a bug since it currently requires workarounds.

 

Context:

I'm currently in the process of converting two stateful "*aggregate*" DSL 
operations to the Processor API since the state of those operations are 
relatively large and takes 99% + of CPU time (when profiled) for serializing 
and deserializing them via Kryo. 

Since DSL aggregations use state stores of [Bytes, Array[Byte]]] even when 
using the in-memory state store, it seems like the only way to reduce the 
serialization/deserialization overhead is to convert heavy aggregates to 
*transform*s.

In my case, *ValueTransformer* seems to be the option. However, since 
ValueTransformers' _transform_ method only exposes the _value_, I'd either have 
to pre-process and add the key to the value or use *Transformer* instead (which 
is not my intent).

 

As internal _*InternalValueTransformerWithKey*_ already has the readOnlyKey, it 
seems like a good idea to pass the key to the transform method as well, esp 
since in a stateful transformation, generally the state store has to be queried 
by the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-291: Have separate queues for control requests and data requests

2018-06-20 Thread Thomas Crayford
+1 (non-binding)

On Tue, Jun 19, 2018 at 8:20 PM, Lucas Wang  wrote:

> Hi Jun, Ismael,
>
> Can you please take a look when you get a chance? Thanks!
>
> Lucas
>
> On Mon, Jun 18, 2018 at 1:47 PM, Ted Yu  wrote:
>
> > +1
> >
> > On Mon, Jun 18, 2018 at 1:04 PM, Lucas Wang 
> wrote:
> >
> > > Hi All,
> > >
> > > I've addressed a couple of comments in the discussion thread for
> KIP-291,
> > > and
> > > got no objections after making the changes. Therefore I would like to
> > start
> > > the voting thread.
> > >
> > > KIP:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 291%3A+Have+separate+queues+for+control+requests+and+data+requests
> > >
> > > Thanks for your time!
> > > Lucas
> > >
> >
>


[jira] [Resolved] (KAFKA-3018) Kafka producer hangs on producer.close() call if the producer topic contains single quotes in the topic name

2018-06-20 Thread Manikumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-3018.
--
Resolution: Duplicate

Resolving this as duplicate of KAFKA-5098.  Please reopen if you think otherwise

> Kafka producer hangs on producer.close() call if the producer topic contains 
> single quotes in the topic name
> 
>
> Key: KAFKA-3018
> URL: https://issues.apache.org/jira/browse/KAFKA-3018
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: kanav anand
>Priority: Major
>
> While creating topics with quotes in the name throws a exception but if you 
> try to close a producer configured with a topic name with quotes the producer 
> hangs.
> It can be easily replicated and verified by setting topic.name for a producer 
> with a string containing single quotes in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7078) Kafka 1.0.1 Broker version crashes when deleting log

2018-06-20 Thread xiaojing zhou (JIRA)
xiaojing zhou created KAFKA-7078:


 Summary: Kafka 1.0.1 Broker version crashes when deleting log
 Key: KAFKA-7078
 URL: https://issues.apache.org/jira/browse/KAFKA-7078
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.0.1
Reporter: xiaojing zhou


Hello

We are running Kafka 1.0.1 version in CentOS for 3 months. Today Kafka crashed. 
When we checked server.log and log-cleaner.log file the following log was found.

server.log

{code}

[2018-06-11 00:04:12,349] INFO Rolled new log segment for 
'__consumer_offsets-7' in 205 ms. (kafka.log.Log)

[2018-06-11 00:04:23,282] ERROR Failed to clean up log for __consumer_offsets-7 
in dir /nas/kafka_logs/lvsp01hkf001 due to IOException 
(kafka.server.LogDirFailureChannel)

java.nio.file.NoSuchFileException: 
/nas/kafka_logs/lvsp01hkf001/__consumer_offsets-7/019668089841.log

 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)

 at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

 at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)

 at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)

 at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)

 at java.nio.file.Files.move(Files.java:1395)

 at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:682)

 at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:212)

 at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:398)

 at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1592)

--

--

 at kafka.log.Log.replaceSegments(Log.scala:1639)

 at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:485)

 at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:396)

 at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:395)

 at scala.collection.immutable.List.foreach(List.scala:392)

 at kafka.log.Cleaner.doClean(LogCleaner.scala:395)

 at kafka.log.Cleaner.clean(LogCleaner.scala:372)

 at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:263)

 at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:243)

 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)

 Suppressed: java.nio.file.NoSuchFileException: 
/nas/kafka_logs/lvsp01hkf001/__consumer_offsets-7/019668089841.log -> 
/nas/kafka_logs/lvsp01hkf001/__consumer_offsets-7/019668089841.log.deleted

 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)

 at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

 at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)

 at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)

 at java.nio.file.Files.move(Files.java:1395)

 at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:679)

 ... 16 more

[2018-06-11 00:04:23,338] INFO [ReplicaManager broker=0] Stopping serving 
replicas in dir /nas/kafka_logs/lvsp01hkf001 (kafka.server.ReplicaManager)

{code}

 

log-cleaner.log

{code}

[2018-06-11 00:04:21,677] INFO Cleaner 0: Beginning cleaning of log 
__consumer_offsets-7. (kafka.log.LogCleaner)

[2018-06-11 00:04:21,677] INFO Cleaner 0: Building offset map for 
__consumer_offsets-7... (kafka.log.LogCleaner)

[2018-06-11 00:04:21,722] INFO Cleaner 0: Building offset map for log 
__consumer_offsets-7 for 1 segments in offset range [23914565941, 23915674371). 
(kafka.log.LogCleaner)

[2018-06-11 00:04:23,212] INFO Cleaner 0: Offset map for log 
__consumer_offsets-7 complete. (kafka.log.LogCleaner)

[2018-06-11 00:04:23,212] INFO Cleaner 0: Cleaning log __consumer_offsets-7 
(cleaning prior to Mon Jun 11 00:04:12 UTC 2018, discarding tombstones prior to 
Sat Jun 09 23:17:35 UTC 2018)... (kafka.log.LogCleaner)

[2018-06-11 00:04:23,216] INFO Cleaner 0: Cleaning segment 19668089841 in log 
__consumer_offsets-7 (largest timestamp Thu Jan 01 00:00:00 UTC 1970) into 
19668089841, discarding deletes. (kafka.log.LogCleaner)

[2018-06-11 00:04:23,220] INFO Cleaner 0: Swapping in cleaned segment 
19668089841 for segment(s) 19668089841 in log __consumer_offsets-7. 
(kafka.log.LogCleaner)

[2018-06-11 00:04:23,343] INFO Cleaner 0: Beginning cleaning of log 
__consumer_offsets-7. (kafka.log.LogCleaner)

[2018-06-11 00:04:23,343] INFO Cleaner 0: Building offset map for 
__consumer_offsets-7... (kafka.log.LogCleaner)

[2018-06-11 00:04:23,388] INFO Cleaner 0: Building offset map for log 
__consumer_offsets-7 for 1 segments in offset range [23914565941, 23915674371). 
(kafka.log.LogCleaner)

{code}

 

Our log files are stored in NAS folder, I checked 
/nas/kafka_logs/lvsp01hkf001/__consumer_offsets-7/019668089841.log, 
this file does exist, not sure why kafka throws NoSuchFileException.

Anyone know what may be issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] KIP-318: Make Kafka Connect Source idempotent

2018-06-20 Thread Stephane Maarek
KIP link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-318%3A+Make+Kafka+Connect+Source+idempotent


By looking at the code, it seems Worker.java is where the magic happens,
but do we need to also operate changes to KafkaBasedLog.java (~line 241) ?

Love to hear your thoughts!


[jira] [Created] (KAFKA-7077) producerProps.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5") producerProps.put(ProducerConfig.ENABLE_IDEMPOTENCE, true)

2018-06-20 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-7077:
--

 Summary: 
producerProps.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5") 
producerProps.put(ProducerConfig.ENABLE_IDEMPOTENCE, true)
 Key: KAFKA-7077
 URL: https://issues.apache.org/jira/browse/KAFKA-7077
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 2.0.0
Reporter: Stephane Maarek
Assignee: Stephane Maarek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)