2019-10-11 09:47:13 UTC - Ricardo Paiva: @Ricardo Paiva has joined the channel ---- 2019-10-11 11:06:26 UTC - Samuel Sun: Hi one simple question, do we have any tool like cruise-control to reshard data in bookkeeper , or we don’t need it at all ? ---- 2019-10-11 12:25:15 UTC - Sijie Guo: You don’t need to do so in Pulsar. ---- 2019-10-11 12:25:32 UTC - Sijie Guo: Pulsar multiple layer architecture is rebelance free ---- 2019-10-11 12:26:59 UTC - Sijie Guo: Not. The message will be redelivered on broker failures or ack timeout ---- 2019-10-11 12:30:09 UTC - Sijie Guo: Current that is expected behavior. There are issues for improving it. +1 : dba ---- 2019-10-11 12:30:43 UTC - Sijie Guo: You can used key based batch container to batch messages ---- 2019-10-11 12:46:04 UTC - Raman Gupta: I can't start my bookie due to a cookie mismatch. I have only one, as this is a prototype, so I can't use `bookieformat` as far as I understand it. What do I do? ---- 2019-10-11 13:03:24 UTC - Raman Gupta: I think I'm having this issue: <https://github.com/apache/pulsar/issues/4162> -- I'm running in k8s and `advertisedAddress` was different when the pod restarted. I've modified that to be the host name, and updated the cookie. But `instanceId` seems to be different every time I start. ---- 2019-10-11 13:08:27 UTC - Matteo Merli: The message can be re delivered to any other connected consumer ---- 2019-10-11 13:37:16 UTC - Raman Gupta: Is there a way to recover/restore the data in zookeeper from the on-disk data? ---- 2019-10-11 13:41:14 UTC - Retardust: <https://github.com/apache/pulsar/issues/5366> <https://github.com/apache/pulsar/issues/5367> ---- 2019-10-11 13:41:48 UTC - Retardust: please ask me to fix issues if it's not clear enough) ---- 2019-10-11 14:17:38 UTC - Junli Antolovich: Hello Everyone, I have been evaluating Windows service bus replacement for on-premise applications. Our applications are using Windows service bus for its classic pub-sub functionality and need to be deployable on premise, secure with high performance, reliable, low latency, batching and low maintenance. I kinda of narrow it down to 2 candidates: Apache Pulsar and RocketMQ, since they both seems to do what we need well. ---- 2019-10-11 14:19:29 UTC - Junli Antolovich: Would appreciate any feedback on the strengths and weakness you have experienced with either of the MOMs. ---- 2019-10-11 14:20:49 UTC - Junli Antolovich: Our applications are on windows platform and in c#. ---- 2019-10-11 14:32:44 UTC - David Kjerrumgaard: @Raman Gupta If you restart the ZK node it should replay the information from the transaction log to recover its state. ---- 2019-10-11 14:59:07 UTC - Raman Gupta: @David Kjerrumgaard In my ignorance I explicitly erased the metadata in ZK, (wrongly) thinking BK would automatically rebuild it from its on-disk data. So the issue isn't that the state in ZK needs recovery, its that the state in ZK is wrong and needs to be rebuilt. ---- 2019-10-11 15:01:33 UTC - David Kjerrumgaard: @Raman Gupta I see. That is quite a bit different. So basically you have manually erased the ledger location information from ZK and want a way to rebuild it from BK? ---- 2019-10-11 15:01:40 UTC - Raman Gupta: Yes ---- 2019-10-11 15:02:19 UTC - David Kjerrumgaard: To be honest, Idk if that is possible.... ---- 2019-10-11 15:03:37 UTC - Raman Gupta: Ok. Good learning experience :slightly_smiling_face: Good thing this is a prototype and I have the data to rebuild the pulsar topics elsehwere. +1 : David Kjerrumgaard ---- 2019-10-11 15:07:48 UTC - Raman Gupta: So what is the right way to update the "cookie" in bookkeeper without losing all my data? I need to set a new `advertisedAddress` value and update the `bookieHost` value in the cookie to match. ---- 2019-10-11 15:08:50 UTC - Raman Gupta: `shell updatecookie` seems like it should work? ---- 2019-10-11 15:11:57 UTC - dba: Hi @Sijie Guo Not sure what that is, I'll have to see how that is implemented. When implementing batching for DotPulsar I just need to know how to bundle messages with different partition_keys and/or ordering_keys. I would suspect that I should bundle based on them, but then it doesn't make sense that they can be set pr SingleMessageMetadata (having a mix of them pr batch). ---- 2019-10-11 15:16:31 UTC - Raman Gupta: Nope, that just keeps giving an error: `Invalid option value` ---- 2019-10-11 15:51:28 UTC - Raman Gupta: Ok this seems to work... boy this stuff is poorly documented: 1) Run `bin/bookkeeper shell cookie_delete 10.5.0.114:3181` where `10.5.0.114:3181` is the old `bookieHost` (which also appears to be the bookie "id") -- these deletes the cookie from zk 2) Edit the cookie file manually to update `bookieHost` -- remember to update both the `journal/current/VERSION` and `ledgers/current/VERSION` files 3) Run `bin/bookkeeper shell cookie_create -cf data/bookkeeper/journal/current/VERSION pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181` where `pulsar-bookkeeper-0.pulsar-bookkeeper.default.svc.cluster.local:3181` is the new `bookieHost`, which creates the updated cookie in zk heart : Poule ---- 2019-10-11 16:17:56 UTC - Karthik Ramasamy: @Raman Gupta - can you please put a PR for this documentation? ---- 2019-10-11 16:21:22 UTC - Raman Gupta: I'm a little concerned that zookeeper is used by bookkeeper for persistent data such as ledger metadata that seemingly can't be recovered from the on-disk format. Zookeeper is meant to be used as a coordinator, not a database. ---- 2019-10-11 16:24:13 UTC - Matteo Merli: ZooKeeper (or any other similar system) is always used to store metadata — In a distributed system, the metadata is also part of the coordination. We don’t use it as a database, in that no “data” is stored there, only pointers to the data. thinking_face : Raman Gupta ---- 2019-10-11 16:29:31 UTC - Raman Gupta: Shouldn't the pointers get written to disk though, once the coordination in ZK is done? That way, if ZK is lost for some reason, the latest information about the data pointers can be loaded back into it and the cluster recovered. ---- 2019-10-11 16:31:10 UTC - Matteo Merli: The key problem is that the data has to be replicated in a consistent way. That’s what “consensus” gives you, be it in form of ZooKeeper (or any other Paxos/Raft systems) ---- 2019-10-11 16:31:17 UTC - Addison Higham: isn't that what a ZK snapshot effectively is? ---- 2019-10-11 16:31:47 UTC - Matteo Merli: Yes, each ZK node has a snapshot and a log of transactions which are synced to disk ---- 2019-10-11 16:32:08 UTC - Matteo Merli: to lose the data in ZK you need to lose/break N disks ---- 2019-10-11 16:32:35 UTC - Addison Higham: I also am using this: <https://github.com/mhausenblas/burry.sh> to snapshot ZK to a simple flat structure ---- 2019-10-11 16:33:09 UTC - Raman Gupta: The problem with a snapshot is you might be going back in time -- you don't know what bookie state any particular snapshot corresponds to. The data has to be written to the bookie storage so that storage is consistent with the pointers. ---- 2019-10-11 16:33:27 UTC - Matteo Merli: the data is always: snapshot + log ---- 2019-10-11 16:33:38 UTC - Matteo Merli: the snapshot is not a backup ---- 2019-10-11 16:35:17 UTC - Addison Higham: I am curious what would be your ideal @Raman Gupta, would you suppose the each individual bookie could read it's own ledgers and populate ZK? I think the problem with that would be that no single bookie could do that, since they have replicas that can drift, you need consensus to restore to ZK ---- 2019-10-11 16:39:18 UTC - Raman Gupta: Sure, there would have to be a recovery process to compare state, and re-reach consensus. I'm not saying this comes for free -- its just weird to me that a storage system like bookkeeper actually needs two different physical data stores to operate and to remain in sync, and loss of either one is disaster. +1 : Vladimir Shchur ---- 2019-10-11 16:44:36 UTC - Raman Gupta: @Addison Higham For burry, have you actually tried restoring a Pulsar cluster from it? What happens when ZK time travels to an earlier point in time than the ledger metadata in BK? ---- 2019-10-11 16:48:01 UTC - Addison Higham: I have used it for the configuration store (my global ZK), which is much lower change of data. For the local ZK (where ledger metadata is stored) I would only use it in the worst case, where I lose ALL my ZK disks. What would happen in that case would be that I would lose being able to contact those ledgers, but could manually recreate metadata needed. ---- 2019-10-11 16:50:10 UTC - Raman Gupta: > but could manually recreate metadata needed If you can do this, then what are we arguing about? :slightly_smiling_face: ---- 2019-10-11 16:51:05 UTC - Addison Higham: I agree it would be nice to have tooling to restore ledger metadata to ZK, but my point being that there is no automated way to do that entirely. You have to decide which copy of the ledger is authoritative in the event of any disagreements, since you no longer have consensus ---- 2019-10-11 16:51:53 UTC - Raman Gupta: That's fine. I have only one bookie right now, so that's pretty easy :slightly_smiling_face: ---- 2019-10-11 16:52:02 UTC - Addison Higham: right now, AFAIK, that would literally be finding the files on disk and making manual requests to ZK ---- 2019-10-11 16:56:31 UTC - Raman Gupta: Hmm, I'm just spelunking through ZK and I still have refs to my old bookie id in there. Oh oh, that doesn't seem good. ---- 2019-10-11 16:59:13 UTC - Raman Gupta: Things still seem to be working though. ---- 2019-10-11 17:00:25 UTC - Srinadh Arepally: @Srinadh Arepally has joined the channel ---- 2019-10-11 17:22:52 UTC - Guillaume Compagnon: @Guillaume Compagnon has joined the channel ---- 2019-10-11 18:21:22 UTC - Sandeep Kotagiri: @David Kjerrumgaard It is certainly my corporate proxy coming in the way. I did all my testing utilizing an instance set up in aws directly. And everything worked as expected. ---- 2019-10-11 18:26:04 UTC - David Kjerrumgaard: Awesome....Glad to see it wasn't an issue with Pulsar. Now you can focus on getting it to work with the proxy ---- 2019-10-11 18:34:32 UTC - Raman Gupta: This is related to the discussion above too: <https://github.com/apache/bookkeeper/issues/1193> -- without a way to have a consistent view of the data in bookkeeper as well as the meta-data, how do we do backups? ---- 2019-10-11 18:42:13 UTC - Matteo Merli: In a system where the data is continuously in motion, a better approach is to rely to on consistent data replication, rather than backup.
If restoring from a backup takes hours, it might be already too late for many use cases anyway. ---- 2019-10-11 18:58:29 UTC - Addison Higham: I can say where we stand right now on our approach: - keep as little data in BK as possible via s3 offloading. Storage offloading makes it so the amount of data we need to worry about being backed up is smaller and we know how to deal with s3 data and using either bucket replication, etc. This means we just need to make sure we gets ZK well backed up. - geo-replication for critical data, this give us not only another place where the data is, but also redundancy in cases of outage. For some use cases though, we expect it will just get parked there with a long term retention. In the event of needing to "restore" we could replay that data. - disk snapshots (EBS) of BK and ZK as a last effort, if we are using these, it means we are probably going to be either losing some state or manually recovering, but between historical data living on S3 and geo-replication, we don't think it is likely to be needed and more just an insurance policy We think that has us pretty well covered for the following: - loss of individual nodes is handled well be built in replication with ZK/BK, and with BK, this is only maybe a day of data, so restores should be quick - loss of a majority of nodes can be handled either by restoring from disk snapshots, manual recovery via known good nodes (easy in the case of ZK, harder for BK), or be just rebuilding data cross regions (for relevant subset) - loss of a whole region is handled by geo-replication (but we only will do it for a subset of data) The things that are a bit harder: - restoring accidental deletions of a topic/etc, however, this is *pretty* hard to do, as you can't delete a topic if there are connected consumers or producers. Obviously, we still want some answer for it. Since cleaning up ledgers is lazy operation, if it is noticed soon enough, we may be able to just restore the data in ZK. For our critical data, we expect it to be geo-replicated, which means you can't delete the topics at all. We think that is good enough for us right now. ---- 2019-10-11 18:58:53 UTC - Addison Higham: for where we want to go in the future, here is some ideas, in thread ---- 2019-10-11 19:00:20 UTC - Addison Higham: What would be a really nice feature of pulsar is the ability to inject code at some place where you know the metadata in ZK is consistent with BK and then be able to take a stable snapshot ---- 2019-10-11 19:03:58 UTC - Addison Higham: I am not sure at all of the feasibility of this, but AFAICT, it seems like you could prevent brokers + BK of making any changes to zookeeper for a short period (perhaps via some lock/watch in ZK) where user code could run to kick off disk snapshots. Obviously it would be in some critical sections and would be tricky to implement... but it should be really nice to get consistent snapshots of BK and ZK for a full restore ---- 2019-10-11 19:04:29 UTC - Bowen Li: @Bowen Li has joined the channel ---- 2019-10-11 19:18:00 UTC - Can: @Can has joined the channel ---- 2019-10-11 19:20:29 UTC - Matteo Merli: > take a stable snapshot The challenge here is that it could 10s or 100s or TBs ---- 2019-10-11 19:20:46 UTC - Matteo Merli: and take days to weeks to take/restore ---- 2019-10-11 19:21:40 UTC - Raman Gupta: Snapshots are quick. They don't need to copy all the data. ---- 2019-10-11 19:22:18 UTC - Raman Gupta: Yeah backups are for fat-finger or DR cases, not for normal situations e.g. bad disks, machines, etc... and its not just "easy" cases like deleting a topic through the rest API, but for for cases like doing `metaformat` by mistake (or because of bad assumptions like I made, coupled with lack of docs on these commands). ---- 2019-10-11 19:33:40 UTC - Raman Gupta: If you're gonna go as far as having safepoints, you might as well right the safepoint meta-data to BK disk as well. That way you can back up / snapshot just the BK disk, and restore ZK from that. Which goes right back to my initial point. ---- 2019-10-11 19:34:57 UTC - Matteo Merli: I think the only reasonably feasible approach is to just keep the data for X amount of time. That will give the opportunity to rollback the metadata to any point between `now` and `now - X`. ---- 2019-10-11 19:35:29 UTC - Matteo Merli: eg: ledgers data is not deleted from bookies for 24h after the metadata is deleted ---- 2019-10-11 19:59:02 UTC - Naby: Hi All. I have a question that I hope someone can help me. If I pass a regex for list of topics to subscribe to, how can I know which topic is the next message from? I am using the python client (pulsar-client 2.4.1). Thanks. ---- 2019-10-11 20:15:06 UTC - Matteo Merli: Use `Message.getTopicName()` ---- 2019-10-11 20:15:35 UTC - Matteo Merli: oh, that was Java… ---- 2019-10-11 20:16:23 UTC - Matteo Merli: In Python: `msg.topic_name()` ---- 2019-10-11 20:30:58 UTC - Naby: Thanks. I just found it and tested it. It worked. ---- 2019-10-11 20:31:49 UTC - Naby: I was going through this that I found it: <https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/pulsar_test.py> ---- 2019-10-11 20:32:08 UTC - Naby: Thanks anyway ---- 2019-10-11 20:41:18 UTC - Addison Higham: @Matteo Merli yeah, I am not referring to a full disk copy, in most cases, like ZFS, etc, I can make the snapshot and then copy the bytes from taht point in time ---- 2019-10-11 20:41:31 UTC - Addison Higham: so usually something that takes a few seconds ---- 2019-10-11 20:57:32 UTC - Matteo Merli: though there the data+metadata are together, in a single node, so it’s easier to get a consistent view ---- 2019-10-11 21:09:49 UTC - Tahir: @Tahir has joined the channel ---- 2019-10-11 21:46:17 UTC - Kendall Magesh-Davis: Has anyone got authentication/authorization working when deploying with the helm chart? ---- 2019-10-11 21:47:48 UTC - Jerry Peng: That is what quite a few people are using ---- 2019-10-11 21:47:59 UTC - Jerry Peng: Including us ---- 2019-10-11 21:55:40 UTC - Sandeep Kotagiri: @David Kjerrumgaard I started the exercise of tiered storage offloading towards testing if pulsar-sql (that is based on Presto) can query data from tiered storage. So far, my experiments are leading me to believe that data from tiered storage cannot be queried by pulsar-sql. My understanding from other posts seem that pulsar-sql should be able to read data from tiered storage. Can you please give me any insights on this? ---- 2019-10-11 21:58:46 UTC - David Kjerrumgaard: You should be able to query data from tiered storage as easily as you can data on the bookies. Provided you have sufficient permissions, connectivity, etc. What type of errors are you seeing? ---- 2019-10-11 22:00:10 UTC - Sandeep Kotagiri: Well not errors. In the standalone mode I am losing some data for some reason. It has got to do with retention settings. ---- 2019-10-11 22:00:31 UTC - Sandeep Kotagiri: And the counts that I am getting off of sql are less than what I get from consuming the messages. ---- 2019-10-11 22:01:07 UTC - David Kjerrumgaard: Is the topic data static or are messages being added to the topic? ---- 2019-10-11 22:01:24 UTC - Sandeep Kotagiri: In a way, I am still fiddling around with standalone mode. So it could be that I am getting some of these settings incorrect to start with. I will start deploying broker, bookie and zookeeper in the regualr mode. ---- 2019-10-11 22:01:49 UTC - Sandeep Kotagiri: I am trying to add about 200000 messages in JSON format to the topic without any consumption. ---- 2019-10-11 22:02:11 UTC - Sandeep Kotagiri: I just subscribe and keep adding data until the topics hit about 200000 i counts. ---- 2019-10-11 22:02:39 UTC - David Kjerrumgaard: Do you stop after you hit 200K ? ---- 2019-10-11 22:02:46 UTC - Sandeep Kotagiri: Yes. ---- 2019-10-11 22:04:23 UTC - Sandeep Kotagiri: I started producing messages and the producers stats from clients match what I am also maintaining as a counter. But by the time I verify the topic internal stats, the numbers are way far below. ---- 2019-10-11 22:04:23 UTC - David Kjerrumgaard: ok. Then the count skew isn't due to that. The Presto SQL engine goes against the ledgers at a certain point in time vs. the consumers. So there is usually some lag in the SQL ---- 2019-10-11 22:04:51 UTC - Sandeep Kotagiri: for e.g. entriesAddedCounter is only about 5000 after adding about 200K messages. ---- 2019-10-11 22:05:02 UTC - David Kjerrumgaard: You can adjust your retention policy to allow more data. The default is 10GB I believe ---- 2019-10-11 22:05:40 UTC - Sandeep Kotagiri: Ok. I thought that the default was set to zero in standalone. Let me increase this number a bit. ---- 2019-10-11 22:06:19 UTC - David Kjerrumgaard: Yea, maybe it is different in standalone ---- 2019-10-11 22:06:58 UTC - Sandeep Kotagiri: I had good success verifying lots of details on a Kubernetes deployment within my corporate network. However, because of the proxy settings, I am trying to run a standalone EC2 instances. ---- 2019-10-11 22:07:01 UTC - Sandeep Kotagiri: :slightly_smiling_face: ---- 2019-10-11 22:07:24 UTC - Sandeep Kotagiri: @David Kjerrumgaard, thank you very kindly for all the help. Appreciate it. +1 : David Kjerrumgaard ---- 2019-10-11 22:23:08 UTC - Kendall Magesh-Davis: how did you configure the JWT? ---- 2019-10-11 22:25:34 UTC - Jerry Peng: <https://pulsar.apache.org/docs/en/security-token-admin/> ---- 2019-10-11 22:25:38 UTC - Jerry Peng: just following the docs ---- 2019-10-11 22:25:47 UTC - Kendall Magesh-Davis: I tried following that doc… ---- 2019-10-11 22:26:48 UTC - Kendall Magesh-Davis: `bin/pulsar tokens create-secret-key --output /opt/my-secret.key --base64` ``` bin/pulsar tokens create --secret-key /opt/my-secret.key -s admin Exception in thread "main" io.jsonwebtoken.io.DecodingException: Illegal base64 character: '-' at io.jsonwebtoken.io.Base64.ctoi(Base64.java:206) at io.jsonwebtoken.io.Base64.decodeFast(Base64.java:255) at io.jsonwebtoken.io.Base64Decoder.decode(Base64Decoder.java:21) at io.jsonwebtoken.io.Base64Decoder.decode(Base64Decoder.java:8) at io.jsonwebtoken.io.ExceptionPropagatingDecoder.decode(ExceptionPropagatingDecoder.java:21) at org.apache.pulsar.broker.authentication.utils.AuthTokenUtils.readKeyFromUrl(AuthTokenUtils.java:115) at org.apache.pulsar.utils.auth.tokens.TokensCliUtils$CommandCreateToken.run(TokensCliUtils.java:149) at org.apache.pulsar.utils.auth.tokens.TokensCliUtils.main(TokensCliUtils.java:319) ``` ---- 2019-10-11 22:31:54 UTC - Matteo Merli: it should work with `file:///opt/my-secret.key` ---- 2019-10-11 22:32:11 UTC - Matteo Merli: `bin/pulsar tokens create --secret-key file:///opt/my-secret.key -s admin` ---- 2019-10-11 22:32:44 UTC - Matteo Merli: without `file://` the string gets parsed as the key is passed directly on the CLI in base64 ---- 2019-10-11 22:32:57 UTC - Kendall Magesh-Davis: I think I’ve been reading too many docs… I see that now. I passed right over that ---- 2019-10-11 22:33:09 UTC - Kendall Magesh-Davis: :face_palm: ---- 2019-10-11 22:38:57 UTC - Kendall Magesh-Davis: ayy it works now. Thanks @Matteo Merli and @Jerry Peng :pray: ---- 2019-10-11 22:39:42 UTC - Jerry Peng: :+1: ---- 2019-10-12 01:14:06 UTC - Jianfeng Qiao: Ok, thanks. ---- 2019-10-12 01:14:37 UTC - Jianfeng Qiao: Got it, thanks. ----
