Guozhang Wang created KAFKA-12370:
-------------------------------------

             Summary: Refactor KafkaStreams exposed metadata hierarchy
                 Key: KAFKA-12370
                 URL: https://issues.apache.org/jira/browse/KAFKA-12370
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Guozhang Wang


Currently in KafkaStreams we have two groups of metadata getter:

1.
{code}
allMetadata
allMetadataForStore
{code}

Return collection of {{StreamsMetadata}}, which only contains the partitions as 
active/standby, plus the hostInfo, but not exposing any task info.

2.
{code}
queryMetadataForKey
{code}

Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
standbys, plus the partition id.

3.
{code}
localThreadsMetadata
{code}

Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for 
active and standby tasks.

All the above functions are used for interactive queries, but their exposed 
metadata are very different, and some use cases would need to have all client, 
thread, and task metadata to fulfill the feature development. At the same time, 
we may have a more dynamic "task -> thread" mapping in the future and also the 
embedded clients like consumers would not be per thread, but per client.

---------------

Rethinking about the metadata, I feel we can have a more consistent hierarchy 
as the following:

* {{StreamsMetadata}} represent the metadata for the client, which includes the 
set of {{ThreadMetadata}} for its existing thread and the set of 
{{TaskMetadata}} for active and standby tasks assigned to this client, plus 
client metadata including hostInfo, embedded client ids.

* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
currently assigned tasks.

* {{TaskMetadata}} includes the name (including the sub-topology id and the 
partition id), the state, the corresponding sub-topology description (including 
the state store names, source topic names).

* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from 
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
{{localMetadata}} (renamed from localThreadMetadata) returns a single 
{{StreamsMetadata}}.

To illustrate as an example, to find out who are the current active host / 
standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
for each returned {{StreamsMetadata}} we loop over their contained 
{{TaskMetadata}} for active / standby, and filter by its corresponding 
sub-topology's description's contained store name. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to