Slack digest for #general - 2019-09-25

Apache Pulsar Slack Wed, 25 Sep 2019 02:11:14 -0700

2019-09-24 10:21:52 UTC - pradeep: Hi Team,
I started a pulsar consumer in failover mode without specifying the consumer 
name. Then I killed the consumer and  restarted the  consumer for the same 
topic with same subscription name (definitely it would have created new random 
consumer name) but I am not receiving any data from pulsar.
When I checked using pulsar-admin api, it has not removed the previous consumer 
from the topic stats (as i started in failover the new restared consumer might 
not have get chance to consumer data)
----
2019-09-24 16:36:02 UTC - Matteo Merli: In the topic stats, it’s reported the 
TCP connection for all the consumers
----
2019-09-24 16:36:24 UTC - Matteo Merli: can you verify that the original TCP 
connection for 1st consumer was indeed teared down?
----
2019-09-24 16:49:27 UTC - pradeep: I just shut down the application. Would not 
it kill the consumers and producers connection if the application is shut down 
abruptly/ forcefully ?
----
2019-09-24 16:52:14 UTC - Jesse Zhang (Bose): My subscription gets into a bad 
status that `some unacked message not redelivered`, even after I restarted all 
the shared clients.   Any ideas why? I have attached the status in the thread.
----
2019-09-24 16:53:28 UTC - Jesse Zhang (Bose): server: standalone pulsar 2.3.1
``````{
     "averageMsgSize": 0.0,
     "deduplicationStatus": "Disabled",
     "msgRateIn": 0.0,
     "msgRateOut": 0.0,
     "msgThroughputIn": 0.0,
     "msgThroughputOut": 0.0,
     "publishers": [],
     "replication": {},
     "storageSize": 9779254,
     "subscriptions": {
         "mock-bmx-als-sub-feat-aa-restst-func": {
             "blockedSubscriptionOnUnackedMsgs": false,
             "consumers": [
                 {
                     "address": "/100.65.150.248:38860",
                     "availablePermits": 82,
                     "blockedConsumerOnUnackedMsgs": false,
                     "clientVersion": "2.4.0",
                     "connectedSince": "2019-09-20T19:21:17.541Z",
                     "consumerName": "my-consumer-name",
                     "metadata": {},
                     "msgRateOut": 0.0,
                     "msgRateRedeliver": 0.0,
                     "msgThroughputOut": 0.0,
                     "unackedMessages": 0
                 }
             ],
             "msgBacklog": 181,
             "msgRateExpired": 0.0,
             "msgRateOut": 0.0,
             "msgRateRedeliver": 0.0,
             "msgThroughputOut": 0.0,
             "type": "Shared",
             "unackedMessages": 0
         }
         }
```
----
2019-09-24 16:53:52 UTC - Matteo Merli: Normally it should.
----
2019-09-24 16:54:58 UTC - Matteo Merli: do you see any errors in broker logs?
----
2019-09-24 16:55:30 UTC - Jesse Zhang (Bose): let me check
----
2019-09-24 16:56:46 UTC - Jesse Zhang (Bose): Also attach the internalStatus, 
in our usage, we ack the messages out of order.
----
2019-09-24 16:57:03 UTC - Jesse Zhang (Bose): 
----
2019-09-24 16:58:30 UTC - Matteo Merli: this looks ok
----
2019-09-24 17:12:09 UTC - Jesse Zhang (Bose): checked the log covers 24hour 
before and after the issue happen, no log is printed at error level, no 
exceptions. I see some warn level log, but seem recurring all the time.
----
2019-09-24 17:16:38 UTC - pradeep: but it was not happening. we had to restart 
the brokers to bring it to normal
----
2019-09-24 17:19:24 UTC - pradeep: 
----
2019-09-24 17:24:59 UTC - Jesse Zhang (Bose): We get into this scenario when 
pulsar is in a redelivery test case:  redeliver 5000 messages to the consumer 
every 10s (client recieved them, but not ack them), and after ~12hours,  we 
started to see only 48xx messages to be redelivered every time(issue started) . 
Then, we started to ack all the message. After that, we found that, while 
unacked message is 0, there are 181 messages in backlog, and we see the 
`"availablePermits": 82`  (it should be 100).
----
2019-09-24 17:28:29 UTC - Jesse Zhang (Bose): Restarted the client, these 181 
messages still not redelivered. in the internal-status, I see the read position 
is at the end of queue, `"readPosition": "13:17210",`
----
2019-09-24 17:32:56 UTC - Matteo Merli: Read position being there is expected. 
Though broker should deliver the pending messages first
----
2019-09-24 17:33:42 UTC - Matteo Merli: Would you be able to share a heap dump 
of broker? That would give access to the dispatcher state and understand if 
anything went wrong there
----
2019-09-24 17:39:05 UTC - Matteo Merli: One question for reproducing this: how 
many consumers were attached to the subscription?
----
2019-09-24 17:39:07 UTC - Jesse Zhang (Bose): thanks for the reply.  We have 
recycled the bad status server already, I don’t have the dump now. Next time I 
got into this, i will talk our team see if I can share the dump.
----
2019-09-24 17:39:37 UTC - Jesse Zhang (Bose): only 1 consumer attached to the 
subscription
----
2019-09-24 17:40:04 UTC - Jesse Zhang (Bose): ConsumerType.Shared
----
2019-09-24 17:40:39 UTC - Jesse Zhang (Bose): we use goclient.
<http://github.com/apache/pulsar/pulsar-client-go|github.com/apache/pulsar/pulsar-client-go>
 v0.0.0-20190507044647-1f4a836a4648
with
<https://archive.apache.org/dist/pulsar/pulsar-2.4.0/DEB/>
running in linux
----
2019-09-24 17:42:18 UTC - Matteo Merli: &gt; We have recycled the bad status 
server already


One less invasive way of cleaning up the state is to force a topic reload. That 
would clear any of such issues: `pulsar-admin topics unload $TOPIC`
+1 : Jesse Zhang (Bose), Jim Lambert, Luke Lu
----
2019-09-24 18:38:41 UTC - Devin G. Bost: Does anyone have a suggestion for how 
to test the output of the `context.newOutputMessage(..)` method?
I found a test in the source code that calls it, but the test is just checking 
to ensure it doesn’t throw an exception. It’s not actually checking the output.
----
2019-09-24 18:45:02 UTC - Devin G. Bost: Nvm. We found a way.
+1 : David Kjerrumgaard
----
2019-09-24 19:53:37 UTC - cmcgaley: @cmcgaley has joined the channel
thinking_face : Colum
----
2019-09-24 19:58:18 UTC - Luke Lu: What’s the expected behavior of unload on a 
topic with active producers and consumers?
----
2019-09-24 20:06:33 UTC - Colum: @Colum has joined the channel
----
2019-09-24 20:20:05 UTC - Colum: So I got a question. I have a Geo-replicated 
cluster setup across three DCs using 2.4.1. It works awesomely, but I'm trying 
to figure out if I can avoid duplicate messages being consumed by subscribers 
in each DC. So, Message 1, 2, and 3 comes in. I'd like a consumer in DC 1 to 
consume Message 1 and 3, consumer in DC to consume Message 2.

I know replications happens asynchronously, which is fine- I just would like to 
know if there is a way that I could have multi-dc consumption without duplicate 
messages natively without rolling my own logic into the consumer?
----
2019-09-24 20:23:01 UTC - Addison Higham: @Jerry Peng (or anyone else familiar 
with the context.stateStore in pulsar functions) it looks like we are getting a 
NPE when doing a `get` on a key that doesn't exist yet. Looking at 
<https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/state/StateContextImpl.java#L62>,
 it seems like this isn't handling the case where the table returns null. 
Anyone else run into this?
----
2019-09-24 20:28:14 UTC - Addison Higham: it seems like this would be a problem 
if anyone was use the state store... so wondering if we are just doing 
something wrong
----
2019-09-24 20:31:36 UTC - Matteo Merli: Producers are notified that the current 
session is being closed, the topic is closed/reopened and producers and 
consumers will quickly reconnect
+1 : Luke Lu
----
2019-09-24 20:36:11 UTC - Jerry Peng: @Addison Higham I get the following 
exception when I try to get a key that doesn’t exist:
```
java.lang.RuntimeException: Failed to retrieve the state value for key 'foo'
        at 
org.apache.pulsar.functions.instance.ContextImpl.getState(ContextImpl.java:325) 
~[pulsar-functions-instance.jar:?]
        at 
org.apache.pulsar.functions.api.examples.TestFunction.process(TestFunction.java:53)
 ~[?:?]
        at 
org.apache.pulsar.functions.api.examples.TestFunction.process(TestFunction.java:43)
 ~[?:?]
        at 
org.apache.pulsar.functions.instance.JavaInstance.handleMessage(JavaInstance.java:63)
 ~[pulsar-functions-instance.jar:?]
        at 
org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:267)
 [pulsar-functions-instance.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
```
----
2019-09-24 20:36:22 UTC - Jerry Peng: how are you using state in your function?
----
2019-09-24 20:37:15 UTC - Addison Higham: we are using `getStateAsync` so that 
is probably the difference
----
2019-09-24 20:39:26 UTC - Addison Higham: what we are doing: rewriting the 
`PulsarOffsetBackingStore` to use the `StateContext` apis, right now, that uses 
a topic and syncs state via writing messages to the topic. Currently, it 
doesn't plumb through auth values
----
2019-09-24 20:39:35 UTC - Addison Higham: when it creates its own client
----
2019-09-24 20:40:57 UTC - Addison Higham: So, we figured it would be easy to 
re-work it to use the state API, but with our use case, we need to read state 
on startup of the `Source` we are implementing, but if it doesn't exist, start 
with an empty state
----
2019-09-24 20:42:00 UTC - Addison Higham: if we don't want that API to return 
nulls, then it seems like it should use a named exception instead of a runtime 
exception as it feels pretty yucky to catch a runtimeException or handle an NPE
----
2019-09-24 20:45:35 UTC - Jerry Peng: @Addison Higham there is a incorrect 
behavior in the code
----
2019-09-24 20:46:01 UTC - Jerry Peng: getStateSync should still return a valid 
completable future
----
2019-09-24 20:46:06 UTC - Jerry Peng: I will fix issue
----
2019-09-24 20:46:25 UTC - Addison Higham: with a null? that is what I was 
thinking made sense
----
2019-09-24 20:46:34 UTC - Jerry Peng: if its getState() and a key doesn’t exist 
it should return null
----
2019-09-24 20:46:50 UTC - Addison Higham: :thumbsup:  happy to be on the review
----
2019-09-24 20:47:04 UTC - Jerry Peng: yes getStateSync should return a 
completable future even if the key doesn’t exist
----
2019-09-24 22:07:18 UTC - Luke Lu: So the client library will do that 
automatically without throwing errors to callers?
----
2019-09-24 22:13:03 UTC - Matteo Merli: Correct. If the unavailability period 
is &lt; than publishTimeout, clients will see no errors
+1 : Luke Lu
----
2019-09-25 00:17:51 UTC - Jerry Peng: @Addison Higham 
<https://github.com/apache/pulsar/pull/5272>
----
2019-09-25 00:19:21 UTC - Junli Antolovich: Hello, does anyone have experience 
in installing Pulsar on windows? I understand it can be installed on Mac and 
Linux, most of our customer base are using windows server 2012 and up (yes I 
know server 2012 extended support ends Oct 2023).  What would be the best 
strategy installing Pulsar on-premise, cluster and standalone?
----
2019-09-25 00:27:47 UTC - Ali Ahmed: @Junli Antolovich pulsar is based on java 
so it can run on any env supporting jdk8, the startup scripts are written in 
bash so either a bash shell on windows in needed to someone will have to write 
a powershell or cmd equivalent which doesn’t exist yet. alternative is run 
docker containers if you env supports it.
----
2019-09-25 00:53:09 UTC - Junli Antolovich: @Ali Ahmed Thanks for the info. 
Based on the documentation "System requirements: Pulsar is currently available 
for MacOS and Linux. To use Pulsar, you need to install Java 8." 
(<https://pulsar.apache.org/docs/en/standalone/>), which implies it is not 
available to Windows or other platforms. Or does this document need to be 
updated?
----
2019-09-25 01:08:29 UTC - Ali Ahmed: it’s accurate there is no cmd or 
powershell script available to  execute for windows currently, if it’s 
developed and committed the documentation can be changed.
----
2019-09-25 05:37:12 UTC - pradeep: we are using pulsar proxy setup to get the 
connection. Is it the issue with proxy setup which is still holding the active 
connectin with broker inspite of application is in shutdown state.
----
2019-09-25 07:34:42 UTC - Jianfeng Qiao: Anyone used this command "pulsar-admin 
topics list-in-bundle tenant/namespace options"?
----
2019-09-25 07:38:44 UTC - Poule: @Jianfeng Qiao does not work on my side
----
2019-09-25 07:38:49 UTC - Poule: i get an err
----
2019-09-25 07:41:09 UTC - Jianfeng Qiao: like this "Expected a command, got 
list-in-bundle"
----
2019-09-25 07:41:36 UTC - Jianfeng Qiao: BTW, I'm using 2.3.1
----
2019-09-25 07:43:02 UTC - Jianfeng Qiao: The document should give an example 
for each command.
----
2019-09-25 07:52:24 UTC - Poule: same msg in 2.4.1
----

Slack digest for #general - 2019-09-25

Reply via email to