Andy Bryant created KAFKA-7608:
----------------------------------

             Summary: A Kafka Streams DSL transform or processor call should 
trigger a repartition like a join
                 Key: KAFKA-7608
                 URL: https://issues.apache.org/jira/browse/KAFKA-7608
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.0.0
            Reporter: Andy Bryant


Currently in Kafka Streams, if any DSL operation occurs that may modify the 
keys of the record stream, the stream is flagged for repartitioning. Currently 
this flag is checked prior to a stream join or an aggregation and if set the 
stream is piped through a transient repartition topic. This ensures messages 
with the same key are always co-located in the same partition and hence same 
stream task and state store.

The same mechanism should be used to trigger repartitioning prior to stream 
{{transform}}, {{transformValues}} and {{process}} calls that specify one or 
more state stores.

Currently without the forced repartitioning, for streams where the key has been 
modified, there is no guarantee the same keys will be processed by the same 
task which would be what you expect when using a state store. Given that 
aggregations and joins already automatically make this guarantee it seems 
inconsistent that {{transform}} and {{process}} do not provide the same 
guarantees.

To achieve the same guarantees currently, developers must manually pipe the 
stream through a topic to force the repartitioning. This works, but is 
sub-optimal since you don't get the handy optimisation where the repartition 
topic contents is purged after use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to