Re: major reduction is performance when using schema registry - KafkaIO

Alexey Romanenko Thu, 13 Apr 2023 09:19:19 -0700

Thanks for testing this!

It requires some additional investigations, so I created an issue for that: 
https://github.com/apache/beam/issues/26262


Feel free to add more details if you have there.

—
Alexey

> On 13 Apr 2023, at 12:45, Sigalit Eliazov <e.siga...@gmail.com> wrote:
> 
> I have made the suggested change and used 
> ConfluentSchemaRegistryDeserializerProvider
> the results are slightly  better.. average of 8000 msg/sec 
> 
> Thank you both for your response and i'll appreciate if you can keep me in 
> the loop in the planned work with kafka schema or let me know if i can assist 
> in anyway,
> 
> Thanks
> Sigalit
> 
> On Wed, Apr 12, 2023 at 8:00 PM Alexey Romanenko <aromanenko....@gmail.com 
> <mailto:aromanenko....@gmail.com>> wrote:
>> Mine was the similar but 
>> "org.apache.beam.sdk.io.kafka,ConfluentSchemaRegistryDeserializerProvider" 
>> is leveraging 
>> “io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient” that I 
>> guessed should reduce this potential impact.
>> 
>> —
>> Alexey
>> 
>>> On 12 Apr 2023, at 17:36, John Casey via user <user@beam.apache.org 
>>> <mailto:user@beam.apache.org>> wrote:
>>> 
>>> My initial guess is that there are queries being made in order to retrieve 
>>> the schemas, which would impact performance, especially if those queries 
>>> aren't cached with Beam splitting in mind. 
>>> 
>>> I'm looking to improve our interaction with Kafka schemas in the next 
>>> couple of quarters, so I'll keep this case in mind while working on that.
>>> 
>>> John
>>> 
>>> On Tue, Apr 11, 2023 at 10:29 AM Alexey Romanenko <aromanenko....@gmail.com 
>>> <mailto:aromanenko....@gmail.com>> wrote:
>>>> I don’t have an exact answer why it’s so much slower for now (only some 
>>>> guesses but it requires some profiling), though could you try to test the 
>>>> same Kafka read but with “ConfluentSchemaRegistryDeserializerProvider” 
>>>> instead of KafkaAvroDeserializer and AvroCoder?
>>>> 
>>>> More details and an example how to use is here:
>>>> https://beam.apache.org/releases/javadoc/2.46.0/org/apache/beam/sdk/io/kafka/KafkaIO.html
>>>>  (go to “Use Avro schema with Confluent Schema Registry”)
>>>> 
>>>> —
>>>> Alexey
>>>> 
>>>> 
>>>> 
>>>>> On 10 Apr 2023, at 07:35, Sigalit Eliazov <e.siga...@gmail.com 
>>>>> <mailto:e.siga...@gmail.com>> wrote:
>>>>> 
>>>>> hi,
>>>>> KafkaIO.<String, T>read()
>>>>>         .withBootstrapServers(bootstrapServers)
>>>>>         .withTopic(topic)
>>>>>         .withConsumerConfigUpdates(Map.ofEntries(
>>>>>                 Map.entry("schema.registry.url", registryURL),
>>>>>                 Map.entry(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup+ 
>>>>> UUID.randomUUID()),
>>>>>         ))
>>>>>         .withKeyDeserializer(StringDeserializer.class)
>>>>>         .withValueDeserializerAndCoder((Class) 
>>>>> io.confluent.kafka.serializers.KafkaAvroDeserializer.class, 
>>>>> AvroCoder.of(avroClass));
>>>>> 
>>>>> Thanks
>>>>> Sigalit
>>>>> 
>>>>> On Mon, Apr 10, 2023 at 2:58 AM Reuven Lax via user <user@beam.apache.org 
>>>>> <mailto:user@beam.apache.org>> wrote:
>>>>>> How are you using the schema registry? Do you have a code sample?
>>>>>> 
>>>>>> On Sun, Apr 9, 2023 at 3:06 AM Sigalit Eliazov <e.siga...@gmail.com 
>>>>>> <mailto:e.siga...@gmail.com>> wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am trying to understand the effect of schema registry on our 
>>>>>>> pipeline's performance. In order to do sowe created a very simple 
>>>>>>> pipeline that reads from kafka, runs a simple transformation of adding 
>>>>>>> new field and writes of kafka.  the messages are in avro format
>>>>>>> 
>>>>>>> I ran this pipeline with 3 different options on same configuration : 1 
>>>>>>> kafka partition, 1 task manager, 1 slot, 1 parallelism:
>>>>>>> 
>>>>>>> * when i used apicurio as the schema registry i was able to process 
>>>>>>> only 2000 messages per second
>>>>>>> * when i used confluent schema registry i was able to process 7000 
>>>>>>> messages per second
>>>>>>> * when I did not use any schema registry and used plain avro 
>>>>>>> deserializer/serializer i was able to process 30K messages per second.
>>>>>>> 
>>>>>>> I understand that using a schema registry may cause a reduction in 
>>>>>>> performance but  in my opinion the difference is too high. 
>>>>>>> Any comments or suggestions about these results?
>>>>>>> 
>>>>>>> Thanks in advance
>>>>>>> Sigalit
>>>> 
>>

Re: major reduction is performance when using schema registry - KafkaIO

Reply via email to