Re: Apache - GSOC'25 projects / Contributions

2025-02-24 Thread Mich Talebzadeh
Hi, To get started, you might want to go through the official Spark documentation and contributor guide: - Apache Spark Documentation - Apache Spark Contributor Guide Regarding GSOC 2025

Re: Kafka Connector: producer throttling

2025-02-24 Thread Abhishek Singla
Isn't there a way to do it with kafka connector instead of kafka client? Isn't there any way to throttle kafka connector? Seems like a common problem. Regards, Abhishek Singla On Mon, Feb 24, 2025 at 7:24 PM daniel williams wrote: > I think you should be using a foreachPartition and a broadcast

Re: Kafka Connector: producer throttling

2025-02-24 Thread daniel williams
I think you should be using a foreachPartition and a broadcast to build your producer. From there you will have full control of all options and serialization needed via direct access to the KafkaProducer, as well as all options therein associated (e.g. callbacks, interceptors, etc). -dan On Mon,

Kafka Connector: producer throttling

2025-02-24 Thread Abhishek Singla
Hi Team, I am using spark to read from S3 and write to Kafka. Spark Version: 3.1.2 Scala Version: 2.12 Spark Kafka connector: spark-sql-kafka-0-10_2.12 I want to throttle kafka producer. I tried using *linger.ms * and *batch.size* config but I can see in *ProducerConfig: Produc