Hello, I've been doing some micro-benchmarking with Pulsar and wondering how to maximize the throughput.
I can get about 30000 messages/s with 9 ms latency with 256 client threads and 1K message size, but can not get more than that even though I increased the number of client threads even though it doesn't fully utilize CPU resource nor disk bandwidth or IOPS. (about 50% CPU usage, 34MB/s out of 120 MB/s for journal disk, ledger disk is mostly not active) Is it an expected result under the environment shown below ? I'm wondering if there is some configuration to get a better throughput, or I'm doing something wrong. The environment I used is as follows. node: 1 Standard E8s v3 (8 vcpus, 64 GiB memory) in Azure mode: standalone disk: 2 disks (1 for ledger, 1 for journal) each can achieve 5000 IOPS for random I/O and 120 MB/s for sequential I/O. topic: persistent partitioned topic, # of partitions: 32 config: default The program I used is here. https://github.com/feeblefakie/misc/blob/master/pulsar/src/main/java/PulsarProducerBenchmark.java (This program basically concurrently produces a specified sized record to the broker, and measures the throughput and an average latency.) It can be easily re-run if you follow the README. https://github.com/feeblefakie/misc/blob/master/pulsar/ It would be great if someone can help me. Thanks, Hiro