2020-09-28 09:15:07 UTC - Linton: interesting promotion of Flow Director Cloud :wink: ---- 2020-09-28 09:34:35 UTC - Andreas Müller: That’s how its pays off. But the biggest promo is on Pulsar, right? :wink: ---- 2020-09-28 09:34:59 UTC - Linton: Indeed :D ---- 2020-09-28 09:35:17 UTC - Andreas Müller: :handshake: ---- 2020-09-28 10:21:35 UTC - Alan Hoffmeister: Hello everyone, I want to install Pulsar in my kubernetes cluster but I don't want to use the helm files because I would like to properly learn all the details. Is there any guide for that? ---- 2020-09-28 10:21:50 UTC - Aravindhan: Posted an issue in github <https://github.com/apache/pulsar/issues/8148>. Will try to do workaround of the custom docker image logic as suggested earlier. ---- 2020-09-28 10:21:52 UTC - Aravindhan: Thanks ---- 2020-09-28 11:41:23 UTC - Shripad Kulkarni: @Shripad Kulkarni has joined the channel ---- 2020-09-28 11:52:24 UTC - Hardik Shelat: I’ve been facing the same issue - with 3 consumers using pulsar go client. One sample of topics stats and internal stats. In my case, the publish rate on the topic is higher than the consume rate (trying to test an overload scenario). The receive q-size is set to 200. ```{ "msgRateIn": 1000.0027260240979, "msgRateOut": 366.1312326652507, "msgThroughputIn": 116957.73549540293, "msgThroughputOut": 42832.3581999109, "averageMsgSize": 116.95741666666666, "storageSize": 67138892, "publishers": [ { "producerId": 1, "msgRateIn": 1000.0027260240979, "msgThroughputIn": 116957.73549540293, "averageMsgSize": 116, "metadata": {} } ], "subscriptions": { "perfTestEgress": { "blockedSubscriptionOnUnackedMsgs": false, "isReplicated": false, "msgRateOut": 366.1312326652507, "msgThroughputOut": 42832.3581999109, "msgRateRedeliver": 0, "msgRateExpired": 0, "msgBacklog": 967, "msgDelayed": 0, "unackedMessages": 0, "type": "Shared", "activeConsumerName": "", "consumers": [ { "blockedConsumerOnUnackedMsgs": false, "availablePermits": -4914, "unackedMessages": 5682, "msgRateOut": 366.1312326652507, "msgThroughputOut": 42832.3581999109, "msgRateRedeliver": 0, "consumerName": "zdpsr", "metadata": {} }, { "blockedConsumerOnUnackedMsgs": false, "availablePermits": 200, "unackedMessages": 0, "msgRateOut": 0, "msgThroughputOut": 0, "msgRateRedeliver": 0, "consumerName": "btoqp", "metadata": {} }, { "blockedConsumerOnUnackedMsgs": false, "availablePermits": 200, "unackedMessages": 0, "msgRateOut": 0, "msgThroughputOut": 0, "msgRateRedeliver": 0, "consumerName": "nzwly", "metadata": {} } ] } }, "replication": {}, "deduplicationStatus": "Disabled" }``` Here’s the output for internal-stats ```{ "waitingCursorsCount": 0, "pendingAddEntriesCount": 0, "entriesAddedCounter": 1311, "numberOfEntries": 1311, "totalSize": 87139688, "currentLedgerEntries": 1311, "currentLedgerSize": 87139688, "lastLedgerCreatedTimestamp": "2020-09-28T11:28:30.165Z", "lastLedgerCreationFailureTimestamp": "", "lastConfirmedEntry": "10:1310", "state": "LedgerOpened", "ledgers": [ { "ledgerId": 10, "entries": 0, "size": 0, "timestamp": 0 } ], "cursors": { "perfTestEgress": { "markDeletePosition": "10:85", "readPosition": "10:121", "waitingReadOp": false, "pendingReadOps": 0, "messagesConsumedCounter": 86, "cursorLedger": 13, "cursorLedgerLastEntry": 32, "individuallyDeletedMessages": "[]", "lastLedgerWitchTimestamp": "", "state": "Open", "numberOfEntriesSinceFirstNotAckedMessage": 36, "totalNonContiguousDeletedMessagesRange": 0, "properties": {} } } }``` ---- 2020-09-28 11:56:20 UTC - Hardik Shelat: In this case, the 2nd and 3rd consumer are blocked, while the first consumer consumes messages and unacked’ed message count goes down to 0. In which case, one of the other two consumers will pick up 1000 messages (not sure why 1000 when I have set receive queue size to 200), and 1st and 3rd consumer will remain blocked. Another weird thing I noticed when I had enabled debug logs on pulsar, was that all 3 consumers were showing consumer-id as 1. Not sure if this is normal or not. ```11:30:49.371 [pulsar-io-51-5] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.0.1:41048 11:30:49.372 [pulsar-io-51-5] INFO org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=<persistent://public/default/testsched-egress>, name =perfTestEgress}, consumerId=1, consumerName=otgse, address=/172.17.0.1:41048} with pending 0 acks 11:30:50.965 [pulsar-io-51-4] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.0.1:41042 11:30:50.966 [pulsar-io-51-4] INFO org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=<persistent://public/default/testsched-egress>, name=perfTestEgress}, consumerId=1, consumerName=evyra, address=/172.17.0.1:41042} with pending 0 acks 11:30:52.238 [pulsar-io-51-3] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.0.1:41038 11:30:52.238 [pulsar-io-51-3] INFO org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=<persistent://public/default/testsched-egress>, name =perfTestEgress}, consumerId=1, consumerName=twtdx, address=/172.17.0.1:41038} with pending 19 acks ``` ---- 2020-09-28 12:08:57 UTC - alex kurtser: @Hardik Shelat Which pulsar version are you using ? ---- 2020-09-28 12:14:55 UTC - Hardik Shelat: I am testing against latest docker image of pulsar, running in standalone mode. ---- 2020-09-28 12:17:43 UTC - Hardik Shelat: I will re-test this against 2.6.1 and revert, if that helps. ---- 2020-09-28 12:21:20 UTC - alex kurtser: just check if you have the IndexOutOfBoundsException error in the broker log ? ---- 2020-09-28 12:24:51 UTC - Hardik Shelat: Nope, don’t have that exception. Also, just re-ran the test with 2.6.1, and same result ---- 2020-09-28 12:25:37 UTC - alex kurtser: is this KEy_shared subscription? ---- 2020-09-28 12:26:25 UTC - Hardik Shelat: No, I am using `shared` subscription, multiple consumers, 1 topic. It’s a persistent, non-partitioned topic ---- 2020-09-28 12:29:24 UTC - Linton: Hi Guys, does anyone know when 2.7.0 is going to be released? There’s a couple of protobuf v3 changes that I’d like to make use of. ---- 2020-09-28 13:02:47 UTC - Frank Kelly: @Rounak Jaggi did you ever get around to creating a Github issue - I looked but did not see any. If not I will create one :slightly_smiling_face: ---- 2020-09-28 13:11:34 UTC - Frank Kelly: I created <https://github.com/apache/pulsar/issues/8152> - feel free to update with any specifics for your use case ---- 2020-09-28 13:40:50 UTC - Megaraj Mahadikar: @Megaraj Mahadikar has joined the channel ---- 2020-09-28 15:31:27 UTC - Shivji Kumar Jha: Hi, got a question on python client. I have a topic with messages encoded by avro (java producer). When i receive messages in the python client, it shows the schema as ByteSchema (not AvroSchema as I expect). Is there a way to tell the consumer to provide schema as an Avro Schema instance? Alternatively, Can ByteSchema bytes be converted to AvroSchema Instance? What I want to finally do is - use this schema to do an _AvroSchema.decode(msg)_ ---- 2020-09-28 15:38:56 UTC - Sean Kim: @Sean Kim has joined the channel ---- 2020-09-28 15:42:53 UTC - Axel Sirota: You have to instantiate the consumer by setting the schema as avro there. I am used to Java api but I am sure it must be the same ---- 2020-09-28 15:43:11 UTC - Axel Sirota: Without it the consumer doesnt inow and will get the default schema, right? ---- 2020-09-28 16:34:48 UTC - Shivji Kumar Jha: @Axel Sirota hey, that would mean every message with the schema provided to consumer. In my use case, different messages are encoded with different schema and we want to decode a message with the schema it was originally encoded with. ---- 2020-09-28 16:46:07 UTC - Axel Sirota: uhmmm and why dont you send to different topics? a given topic has one schema submitted to it ---- 2020-09-28 16:46:23 UTC - Axel Sirota: other way how would schema evolution work in your case? ---- 2020-09-28 18:35:15 UTC - Gabriel Ciuloaica: Does anybody tried to geo-replicate a namespace to 250 or more remote locations ? ---- 2020-09-28 18:40:23 UTC - Maxwell Newbould: @Maxwell Newbould has joined the channel ---- 2020-09-28 19:10:53 UTC - Addison Higham: I am not aware of anyone trying to replicate to this many places. If you are trying to do a full mesh, this is likely to be very expensive (though theoretically possible?). If you are trying to do more of a "fan out" that might be more doable.
Each replication "channel" is essentially an extra subscription, consumer, and producer to the new cluster. So this should be fine, but it will cause a lot more load on brokers. Can you share some more details on the use case? ---- 2020-09-28 19:15:46 UTC - Addison Higham: @Alan Hoffmeister I would look at the bare metal guide (<https://pulsar.apache.org/docs/en/deploy-bare-metal/>) and then you can see how the charts work fairly easily by rendering the helm charts. The bare metal guide should give you most of the important points of running Pulsar, the helm charts then encode just some best practices for running on k8s, which we don't have quite as comprehensive of docs for. ---- 2020-09-29 02:28:29 UTC - Shivji Kumar Jha: Same topic provides us ordering guarantees for all the domain objects. ---- 2020-09-29 03:22:58 UTC - Gabriel Ciuloaica: @Addison Higham I’m looking to add a replication mechanism for an existing system, such each site may work independently of network issues. There are multiple scenarios that need to cover, fan-out being one of them, but there, is a need for one-to one and fan-in. Let me try to build a diagram …. ---- 2020-09-29 03:32:28 UTC - Addison Higham: if you are doing selective parts of the graph and not full mesh, it should be doable. One thing that is somewhat similar to this is some of the use cases we see with IoT, where individual sites may aggregate to a regional cluster, with those regional clusters then either aggregating to a global cluster or replicating just a subset of data between regional clusters. ---- 2020-09-29 03:34:45 UTC - Gabriel Ciuloaica: you are right, it is similar with IoT cases. It has basically to replicate stream of changes from an edge site to central system but also from central system to edges ---- 2020-09-29 03:35:52 UTC - Gabriel Ciuloaica: it is similar with a sharded db, where the central site has all shards, and each edge site has one shard ---- 2020-09-29 03:38:39 UTC - Gabriel Ciuloaica: based on my calculations there it will be one topic used for fan-out scenario, one topic for fan-in scenario, and 500 topics one to one replication ---- 2020-09-29 03:39:24 UTC - Gabriel Ciuloaica: I will read more on Pulsar, I’m more familiar with kafka, but using Mirror Maker is almost impossible ---- 2020-09-29 03:42:06 UTC - Addison Higham: This has some advanced topics about Pulsar's built in geo-replication: <https://www.youtube.com/watch?v=0UoCHpdG9E0> (apologies for audio being a bit weird) ---- 2020-09-29 03:43:23 UTC - Addison Higham: but tl;dr is that the docs mostly cover a full mesh geo-replication using a shared zookeeper cluster, but you can do more flexible setups including aggregation, fan-out, etc ---- 2020-09-29 03:45:34 UTC - Gabriel Ciuloaica: thank you ---- 2020-09-29 07:25:32 UTC - alex kurtser: Hello I have a naive question. Let's suppose we are working with Key_shared subscription with 5 concurrent consumers . One of the consumers accepted 500 messages and before it succeeded to send ack for 100 messages from the 500 it accepted the consumer crashed and went out forever. What will happen to the last 100 messages? how and when the broker will understand that it will never get ack for these messages and it has to send them again to any other connected consumer or these messages actually lost ? ----
