2019-07-29 09:31:17 UTC - Alexandre DUVAL: Well this is not enough, the apache bookie just explosed 95% used disk. The lastest pulsar master isnt using the latest master bookkeeper ? ---- 2019-07-29 11:06:25 UTC - Yuvaraj Loganathan: Pulsar master does not use master bookeeper ---- 2019-07-29 11:09:12 UTC - Alexandre DUVAL: Yup, but I can take the generated bookkeper jar and place them in pulsar lib directory? ---- 2019-07-29 13:47:54 UTC - Prem: @Prem has joined the channel ---- 2019-07-29 14:17:21 UTC - David Kjerrumgaard: Pulsar doesn't currently provide support for 100% of the JMS API. So it might be difficult to do what you are looking to do. Have you looked at the following project at all? It appears to be a JMS wrapper around Pulsar. <https://github.com/QuiNovas/pulsar-jms-provider/tree/feature/SE-3> ---- 2019-07-29 14:21:14 UTC - Bruno Panuto: We have a project here that could use that configuration tuned for a large number of topics. I would appreciate any links on that! ---- 2019-07-29 16:05:55 UTC - Florentin Dubois: @Florentin Dubois has joined the channel ---- 2019-07-29 16:06:02 UTC - Pierre Zemb: @Pierre Zemb has joined the channel ---- 2019-07-29 16:06:47 UTC - remi: @remi has joined the channel ---- 2019-07-29 16:34:52 UTC - Steven Le Roux: Hi everyone,
I’m working with an instance of Pulsar. When I list clusters (pulsar-admin clusters list), an admin URL of a cluster would list all clusters in the instance. Looking at brokers (pulsar-admin brokers list <cluster>), whatever the cluster is, only local brokers to the admin API will be listed. I’ve spotted from github ( <https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/BrokersBase.java#L93> ) that only native brokers are effectively replied. It may be confusing and I think brokers could be either registered in the configuration Store or use the serviceURL from the configStore to retreive proper Broker list from other clusters. What’s your opinion? Is there any work related on this or are you interested by an issue/PR to enhance this behaviour? Thx :slightly_smiling_face: ---- 2019-07-29 18:05:32 UTC - Tarek Shaar: Can someone please tell me if Pulsar will create a seperate thread per Pulsar consumer? ---- 2019-07-29 18:06:16 UTC - Matteo Merli: No, there is a thread pool with just 1 thread in total (by default) to handle all consumers ---- 2019-07-29 18:07:54 UTC - Tarek Shaar: Thanks. I work on a real time FX platform we use Sonic MQ (JMS). We are used to the fact that each message listener (a session) is created with in its own JMS thread. I wonder how I can scale Pulsar consumers if they all share one global thread ---- 2019-07-29 18:11:46 UTC - Jerry Peng: @Tarek Shaar the default can be changed. When you create a PulsarClient, you can set the number of ioThreads ---- 2019-07-29 18:12:16 UTC - Jerry Peng: @Tarek Shaar <https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/ClientBuilder.java#L215> ---- 2019-07-29 18:16:17 UTC - Tarek Shaar: Ok. I am running a suite of throughput tests in Java comparing Kafka and Pulsar. I managed to get 800k messages per second on Kafka and roughly the same on Pulsar (producers). On the consumer am getting 1.2M on Pulsar and 700K on Kafka. I will tweak the number of threads as you suggested, I am sure I can get better throughput ---- 2019-07-29 18:16:55 UTC - Jerry Peng: :+1: ---- 2019-07-29 18:27:47 UTC - Matteo Merli: @Tarek Shaar Are you using the listener or calling consumer.receive() directly ---- 2019-07-29 18:27:48 UTC - Matteo Merli: ? ---- 2019-07-29 18:28:31 UTC - Matteo Merli: (you can have your own thread pool, or 1 thread per consumer, if that suits your app logic better) ---- 2019-07-29 18:28:59 UTC - Tarek Shaar: listener, just like JMS. I am not sure if I should use my own pool or use io threads ---- 2019-07-29 18:32:01 UTC - Matteo Merli: the pulsar client has 2 thread pools, actually. One for IO threads (default 1, this is used for async network IO) and then the listener thread, which is used to call in the message listener. The app code is allowed to block the listener thread, and that will automatically adjust the backpressure ---- 2019-07-29 18:33:28 UTC - Matteo Merli: Take a look at <https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ClientBuilder.html#listenerThreads-int-> to increase the listener thread pool size ---- 2019-07-29 18:35:38 UTC - Tarek Shaar: So on the producer side, if I use the same call (io threads) am assuming I will get a new thread for each producer? ---- 2019-07-29 18:35:46 UTC - Tarek Shaar: So this call will be a new thread ---- 2019-07-29 18:35:48 UTC - Tarek Shaar: prodClient.newProducer().enableBatching(true).batchingMaxMessages(2000).blockIfQueueFull(true).compressionType(CompressionType.ZLIB); ---- 2019-07-29 18:36:41 UTC - Matteo Merli: In any case the number of IO thread is limited. Most of the times the async network IO is not using a lot of CPU anyway ---- 2019-07-29 18:38:11 UTC - Matteo Merli: also, for max throughput, generally LZ4 compression is preferred in that is much faster. If you are looking for greater compression ratio, then you can try the ZSTD compression. ---- 2019-07-29 18:47:18 UTC - Tarek Shaar: that's wonderful thanks. Do you happen to you know (on a separate note) how pulsar internally managed to isolate the tailing reads, catch reads and the writes? In Kafka slow consumers end up slowing down the entire cluster since the OS page cache ends up getting wiped off ---- 2019-07-29 18:49:59 UTC - Matteo Merli: Pulsar, uses BookKeeper as as storage. (Brokers are not using the disk at all). BK doesn’t depend on OS page cache. Typically, the configuration for BK uses 2 disks (or RAID groups). One dedicated to journal, where just sequential writes are issued (and fsynced). The other is where the data is stored in background and indexed. ---- 2019-07-29 18:50:31 UTC - Matteo Merli: Read ops on the storage device are not impacting the throughput or latency on the journal device ---- 2019-07-29 18:51:30 UTC - Jerry Peng: @Tarek Shaar Here’s a blog that gives on overview on that subject matter: <https://streaml.io/blog/apache-pulsar-architecture-designing-for-streaming-performance-and-scalability> ---- 2019-07-29 18:53:21 UTC - Jerry Peng: > how pulsar internally managed to isolate the tailing reads, catch reads and the writes? The gist is that all of those are served via different IO paths thus not effecting one another ---- 2019-07-29 18:55:54 UTC - Tarek Shaar: That makes sense thanks ---- 2019-07-29 20:57:24 UTC - Howard Zhang: @Howard Zhang has joined the channel ---- 2019-07-29 21:00:18 UTC - Howard Zhang: Hello guys, <http://pulsar.apache.org/en/admin-rest-api/#operation/resetCursorOnPosition> how do I specify the message ID for this endpoint? ---- 2019-07-29 21:04:58 UTC - Igor Zubchenok: So, I would very appreciate if there are anything for the case. Please. ---- 2019-07-29 22:01:04 UTC - David Kjerrumgaard: @Igor Zubchenok The work you referenced earlier "to optimise Pulsar to work with large number of topics" , was that done in your environment? ---- 2019-07-29 22:27:41 UTC - Grant Wu: Did something change between Pulsar 2.2.1 and Pulsar 2.4.0 with Pulsar function deployment? ---- 2019-07-29 22:27:53 UTC - Grant Wu: Deployment scripts that were working before are now failing with `The cpu allocation for the function must be positive` ---- 2019-07-29 22:32:04 UTC - Grant Wu: This especially is confusing to me because <https://pulsar.apache.org/docs/en/functions-deploying/> only mentions it for the Docker runtime, which we aren’t using ---- 2019-07-29 22:34:26 UTC - David Kjerrumgaard: @Grant Wu Are you providing a value for `--cpu` at all? It sounds like there is a new validation check for that parameter regardless of the target runtime ---- 2019-07-29 22:34:33 UTC - Grant Wu: I am not. ---- 2019-07-29 22:35:51 UTC - David Kjerrumgaard: My guess is that is defaulting that value to zero and then failing the validation check. Can you confirm by providing a value when you submit? If that is the issue, then we need to file an issue ---- 2019-07-29 22:39:12 UTC - Grant Wu: I will do some tests ---- 2019-07-29 23:00:02 UTC - Igor Zubchenok: @David Kjerrumgaard Yes, it was done as for 2.2. However I modified config based on chat suggestions only, final config has never been reviewed by anyone here. ---- 2019-07-29 23:06:38 UTC - Grant Wu: It creates successfully with a standalone Pulsar cluster. ---- 2019-07-29 23:06:43 UTC - Grant Wu: I can’t tell why, to be honest ---- 2019-07-29 23:14:11 UTC - Grant Wu: Could be an issue with using an older Pulsar client ---- 2019-07-29 23:26:06 UTC - Grant Wu: @David Kjerrumgaard I have reproduced this being an issue with using the 2.2.1 client with a 2.4.0 broker ---- 2019-07-29 23:26:25 UTC - Grant Wu: With a 2.4.0 client, the client appears to automatically set a cpu = 1.0 default ---- 2019-07-29 23:26:46 UTC - Grant Wu: it seems strange to me that it would behave like this, especially because the documentation implies that CPU is irrelevant for non-docker runtimes ---- 2019-07-29 23:27:13 UTC - David Kjerrumgaard: Yes that is odd. ---- 2019-07-29 23:27:43 UTC - David Kjerrumgaard: Thanks for investigating this issue ---- 2019-07-29 23:40:45 UTC - David Kjerrumgaard: Would you be able to share your final configuration for review? ---- 2019-07-29 23:45:03 UTC - Ali Ahmed: I have fixed the missing pulsar-client for osx ```pip install pulsar-client==2.3.2``` ---- 2019-07-29 23:45:14 UTC - Ali Ahmed: will fix the 2.4.0 for osx soon ---- 2019-07-29 23:47:32 UTC - Matteo Merli: :+1: ---- 2019-07-30 00:22:55 UTC - Igor Zubchenok: Sure ---- 2019-07-30 00:24:07 UTC - Igor Zubchenok: ---- 2019-07-30 01:43:01 UTC - xue: @xue has joined the channel ---- 2019-07-30 03:44:14 UTC - Ali Ahmed: pulsar-client for osx has been published ```pip install pulsar-client==2.4.0 ``` ---- 2019-07-30 04:11:53 UTC - jia zhai: Hi @Howard Zhang message ID usually contains 2 parts (ledgerId:entryId), you could get some information of ledgers from admin api `topics stats-internal`. also client api will return messageid when producer.send, and consumer.receive (a message contains messageId) is called. ---- 2019-07-30 04:57:35 UTC - Shivji Kumar Jha: Hi, we need to enable tls and auth between broker and bookkeeper clusters. I see this open ticket for the same but this is not assigned to anyone yet. Is this on plan? <https://github.com/apache/pulsar/issues/3376> ---- 2019-07-30 05:00:35 UTC - Sijie Guo: @Shivji Kumar Jha I think that is already supported. I remembered @jia zhai already documented it as part of adding Kerberos documentation. @jia zhai can you confirm? +1 : Shivji Kumar Jha ---- 2019-07-30 06:04:45 UTC - jia zhai: HI @Sijie Guo @Shivji Kumar Jha , right, I have add the part of BookKeeper config for Kerberos. But for TLS, It need different configs: <http://bookkeeper.apache.org/docs/latest/security/tls/#configuring-clients> We need add and support these config in [BookKeeperClientFactoryImpl](<https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/BookKeeperClientFactoryImpl.java>) ---- 2019-07-30 06:07:32 UTC - Shivji Kumar Jha: Thanks @jia zhai. Will it be possible for you to prioritise BookKeeperClientFactoryImpl.java enhancements? ---- 2019-07-30 06:09:34 UTC - jia zhai: OK. we will prioritise it. ----
