Hi Ömer Many thanks for your response.
The bandwidth between the two sites is many orders of magnitude greater than the 1.5 MB throughput I’m able to achieve, so I’m confident that the issue isn’t a case of low bandwidth. I have used the -1 setting on the broker (as well as setting explicitly) but it didn’t help, and I'm fairly sure that the values I’ve configured at the OS level are what is actually being applied as sysctl reports the excepted values. I do agree that some basic Linux validation tests outside of Kafka would be a good idea Thanks Austin On 2024/08/22 16:37:46 Ömer Şiar Baysal wrote: > Hi Austin, > > I think it is worth to mention also the network bandwidth without even > Kafka involved. You can create a test bench with basic Linux toolkit like > ncat. Tuning socket buffers hardly make sense if the bandwidth is already > low between the producer and remote site. > > When OS is configured to increase the buffers, you should let broker to use > the value by setting it to -1. Otherwise the value should be picked up once > the socket is created. Also it is good idea to check CentOS documentation > if socket buffers are really configurable and no other mechanisms like > tuned cutting in between. > > Good luck. > OSB > > On Thu, Aug 22, 2024, 18:18 Austin Hackett <ha...@me.com.invalid> > wrote: > > > Hi List > > > > I am running Kafka 3.7.1 on CentOS 7.9. > > > > I have written a Kafka producer program in Python using confluent_kafka > > (which uses librdkafka). > > > > When the program is running on a machine in the same data centre as the > > Kafka cluster, a single producer writing 1,000 byte messages to a single > > topic partition can sustain approx 55 MB/sec of throughput over a ten > > minute period. > > > > Produer config: > > > > request.required.acks=all > > enable.idempotence=true > > > > When I run the program on a machine in a remote data centre with a high > > round-trip time the throughput drops to approx 1.5 MB/sec. > > > > This is expected, and as noted in the docs ( > > https://kafka.apache.org/documentation/#datacenters): > > > > "Kafka naturally batches data in both the producer and consumer so it can > > achieve high-throughput even over a high-latency connection. To allow this > > though it may be necessary to increase the TCP socket buffer sizes for the > > producer, consumer, and broker using the socket.send.buffer.bytes and > > socket.receive.buffer.bytes configurations". > > > > Based on the use-case, I have calculated that 25 MB socket buffers to be > > optimal for my use-case and have made the following OS config changes on > > the client machine and brokers: > > > > net.ipv4.tcp_rmem='4096 87380 27262976' > > net.ipv4.tcp_wmem='4096 16384 27262976' > > net.core.rmem_max=27262976 > > net.core.wmem_max=27262976 > > net.core.rmem_default=27262976 > > net.core.wmem_default=27262976 > > > > and confirmed that net.ipv4.tcp_window_scaling=1 > > > > I have also set socket.send.buffer.bytes=27262976 and > > socket.receive.buffer.bytes=27262976 on the brokers and > > socket.send.buffer.bytes=27262976 on the produer. > > > > However, this has made no difference in terms of throughput. > > > > I also tried setting socket.send.buffer.bytes=-1 and > > socket.receive.buffer.bytes=-1 on the brokers and > > socket.send.buffer.bytes=0 (i.e. use OS defaults) but this also made no > > difference. > > > > Note that I have turned on "msg" debug on the producer and confirmed that > > batching is occuring, with a typical MessageSet having ~ 9000 messages. > > > > Also note the broker logs show the expected values for > > socket.send.buffer.bytes and socket.receive.buffer.bytes at startup, so i'm > > confident the brokers have picked up the config property changes after I > > restarted them. > > > > This led me to suspect that Kafka is not honouring my socket buffer sizing > > settings. > > > > When I look at the output of "ss -tmn" on the broker for the socket that > > my producer is using the recieve buffer is not of the expected size: > > > > ==== > > ESTAB 0 148 XX.XXX.XX.XX:6669 XX.X.XX.XX:58870 > > skmem:(r0,rb971074,t0,tb46080,f45056,w0,o0,bl0,d438) > > ==== > > > > Per the above output, the recieve buffer is 971074 bytes > > (skmem:...rb971074). > > > > I have also set > > log4j.logger.org.apache.kafka.common.network.Selector=DEBUG in > > log4j.properties on the broker in order to check the size of the recieve > > buffer requested. However, I only see DEBUG entries for RaftManager, > > NodeToControllerChannelManager, and ReplicaFetcher but not the socket my > > producer is writing to as far as i can tell. > > > > if anyone has any suggestions on how I can make sure that Kafka uses my > > desired socket buffer sizes, or other ways of verifying whether or not it > > is using them, it would be much appreciated > > > > Thanks > > > > Austin > > > > >