2020-09-01 09:26:51 UTC - Enrico Olivelli: Do you see errors or any log on the broker? ---- 2020-09-01 13:55:35 UTC - Nazia Firdous: i didn't get any issue in broker.. ---- 2020-09-01 14:14:39 UTC - Ian: Does anyone know a way to do an exponential backoff (or similar) for negativeAckRedeliveryDelay? From the docs (<http://pulsar.apache.org/api/client/2.4.1/org/apache/pulsar/client/api/ConsumerBuilder.html#negativeAckRedeliveryDelay-long-java.util.concurrent.TimeUnit->) it doesn't look like this is supported, but it may be useful. ---- 2020-09-01 16:19:53 UTC - Enrico Olivelli: Is <http://67.160.195.238:8080|http://67.160.195.238:8080> working using the browser or curl? ---- 2020-09-01 16:28:50 UTC - Nazia Firdous: through browser ---- 2020-09-01 16:57:22 UTC - Raman Gupta: Thanks for the confirmation ---- 2020-09-01 17:21:31 UTC - Addison Higham: @Raman Gupta forgot to mention, so shared_key consumers should get ordered messages even with partition changes.
And just some background to hopefully give some more context: In the past the only key was the `partition_key` and `ordering_key` was only introduced as part of the introduction of key_shared, so if you read `key`, assuming partition_key (thought we should update the docs :slightly_smiling_face:). Physical partitioning of message data is always done by partition key and at the moment, AFAIK pretty much the only thing that considers ordering_key is the key_shared "dispatcher", this is the component that actually reads messages either out of the broker cache or from a bookie and actually writes the message to the consumer. Other components (such as compaction) could be made aware of ordering_key and conditionally do something more elegant, but ATM, most do not. ---- 2020-09-01 17:37:16 UTC - Raghav: Hi does anyone know the configuration to set the max num of file descriptors that the broker can use? I am trying to test a scenario with huge number of topics on a broker, but I am getting maxed at 3613 open files (lsof -i :6650 gave this number). I couldn’t find any setting in the broker.conf to change it. Also I have set the ulimit(both soft and hard) to a very big number, but that didn’t resolve the issue. It is always getting maxed at 3613 and the producers/consumers are not able to connect beyond a value. ---- 2020-09-01 17:38:33 UTC - Raman Gupta: Thanks @Addison Higham. If physical partitioning is done by partition_key, and the key_shared dispatcher uses ordering_key, wouldn't there be potential ordering issues if those two values don't match? ---- 2020-09-01 17:53:50 UTC - Addison Higham: that is a good question, I actually have some things coming up that will have me digging into that much more, will keep updating here +1 : Raman Gupta ---- 2020-09-01 20:02:28 UTC - Sijie Guo: @Evan Furman Oh the pulsar_detector was a program that we developed for end-to-end monitoring. but we haven’t open sourced the program yet. ---- 2020-09-01 20:05:00 UTC - Evan Furman: good to know, you guys do package the dashboard for it fyi ---- 2020-09-01 20:07:16 UTC - Sijie Guo: Yeah. We are thinking of open sourcing it soon although we need to find a place to put it. ---- 2020-09-01 20:11:08 UTC - Evan Furman: Got it, looks nice. Would love to use it. We also are not seeing metrics for `pulsar_storage_read_latency_count?` ---- 2020-09-01 20:11:19 UTC - Evan Furman: cc: @Tim Corbett ---- 2020-09-01 20:13:51 UTC - Sijie Guo: okay. I need to double check if this is from a change that we haven’t contributed back. ---- 2020-09-01 20:14:05 UTC - Evan Furman: :thumbsup: ---- 2020-09-01 21:58:39 UTC - Evan Furman: @Sijie Guo one more question while we’re on this topic. I’ve enabled topic level metrics in the broker config but don’t have any metrics in prometheus. ```[[email protected] conf]# grep -i prometheus broker.conf exposeTopicLevelMetricsInPrometheus=true exposeConsumerLevelMetricsInPrometheus=true``` ---- 2020-09-01 22:36:05 UTC - Evan Furman: We are seeing an issue where consumption ceases on 4 of the 8 topics even though there is an existing backlog. It does not seem to recover. We are running `pulsar-perf` `v2.6.0` with cluster version `2.6.1`. ---- 2020-09-01 22:36:12 UTC - Evan Furman: ---- 2020-09-01 22:36:51 UTC - Evan Furman: ---- 2020-09-01 23:16:08 UTC - Broc Woodworth: @Broc Woodworth has joined the channel ---- 2020-09-02 00:40:05 UTC - Tim Corbett: Of potential interest here: the subscription-level stats indicate `unackedMessages` is 1 but no individual consumer reports any unacked messages. ---- 2020-09-02 02:08:37 UTC - Addison Higham: apologies for delayed response, have you looked at any of the open issues? I know there have been a few reports of things like this but I am not sure if there have been a root cause identified. One thing is that if you need to be immediately unblocked, you likely can do one of the following: 1. unload the topic or namespace 2. restart the consumers ---- 2020-09-02 02:09:16 UTC - Addison Higham: Also, if you have some logs you could share and a github issue, they might be helpful, I think we are still looking for a solid set of circumstances to get to the bottom of what is happening here ---- 2020-09-02 05:22:54 UTC - Tim Corbett: A cursory glance does not find any existing (open) issue matching exactly, there is a closed issue which seems close (#6966). We would be happy to open an issue but if you have any specifics on what logs or commands would likely be useful, please let us know. ---- 2020-09-02 06:38:30 UTC - Raghav: Can anyone help me understand this issue on perf-client. No error is reported on the broker 12:06:11.118 [pulsar-client-io-2-2] WARN org.apache.pulsar.client.impl.PulsarClientImpl - [<persistent://public/ns10/p6-0>] Failed to get partitioned topic metadata: org.apache.pulsar.client.api.PulsarClientException$TooManyRequestsException: Failed due to too many pending lookup requests 12:06:11.118 [pulsar-perf-producer-exec-1-1] ERROR org.apache.pulsar.testclient.PerformanceProducer - Got error java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException$TooManyRequestsException: Failed due to too many pending lookup requests at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_252] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[?:1.8.0_252] at org.apache.pulsar.testclient.PerformanceProducer.runProducer(PerformanceProducer.java:467) ~[org.apache.pulsar-pulsar-testclient-2.6.0.jar:2.6.0] at org.apache.pulsar.testclient.PerformanceProducer.lambda$main$1(PerformanceProducer.java:329) ~[org.apache.pulsar-pulsar-testclient-2.6.0.jar:2.6.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] Caused by: org.apache.pulsar.client.api.PulsarClientException$TooManyRequestsException: Failed due to too many pending lookup requests at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:993) ~[org.apache.pulsar-pulsar-client-original-2.6.0.jar:2.6.0] at org.apache.pulsar.client.impl.ClientCnx.handlePartitionResponse(ClientCnx.java:510) ~[org.apache.pulsar-pulsar-client-original-2.6.0.jar:2.6.0] at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:119) ~[org.apache.pulsar-pulsar-common-2.6.0.jar:2.6.0] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) ~[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) ~[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] ... 2 more ---- 2020-09-02 06:58:05 UTC - Raghav: setting LimitNOFILE in systemd script resolved the issue ----
