Hello,

Problem: Special characters such as öüä are being save to our sinks are "?".
Set up: We read from Kafka using Kafka IO, run the Pipeline with DataFlow Runner and save the results to BigQuery and ElasticSearch.

We checked that data is being written to Kafka in UTF-8 (code check). We checked also that the special characters appear using kafka-console-consumer. Something else is that in a local setup, with Kafka in docker* and using Direct Runner, the character was correctly encoded. *the event was writen using kafka-console-producer.

Reading from Kafka:

pipeline
    .apply(
        "ReadInput",
        KafkaIO.<String, String>read()
            .withBootstrapServers(...)
            .withTopics(...)
            .updateConsumerProperties(...) // only"group.id"  
and"auto.offset.reset"
            .withValueDeserializer(StringDeserializer.class)
            .withCreateTime(Duration.standardMinutes(10))
            .commitOffsetsInFinalize())

So, any clues on where to investigate? In the mean time I'm going to add more logging to the application to see if I can detect where the characters get "lost" in the pipeline. Also try to write to local Kafka using a Java Kafka Producer where I can be sure it is written in UTF-8.

Thank you for the support.



Reply via email to