2020-10-15 09:11:33 UTC - xiaolong.ran: I send a issue <https://github.com/apache/pulsar/issues/8268> to track this. +1 : Emil ---- 2020-10-15 09:20:58 UTC - Rong: @Rong has joined the channel ---- 2020-10-15 13:23:25 UTC - Sankararao Routhu: Hi! is there any way we can setup Zookeeper Ensemble across two data centers(regions) which should sustain failure of one data center with synchronous replication enabled in pulsar ---- 2020-10-15 14:00:59 UTC - Penghui Li: Could you please provide more details such as which subscription type are you using? ---- 2020-10-15 14:11:50 UTC - Mehran prs: @Mehran prs has joined the channel ---- 2020-10-15 14:29:15 UTC - Mehran prs: Hi everyone, we are a small team (3 back-end developer), and our project is not big (15 microservice), each microservice has about 30 topics, and our project's architecture is event-driven so we need to a streaming platform like pulsar or kafka. I have a question: do pulsar is good for our project? is it not too big for us (cost and management on our k8s)? ---- 2020-10-15 14:44:04 UTC - ckdarby: A video I did a while back about why we picked Pulsar instead of kafka, <https://www.youtube.com/watch?v=jLruEmh3ve0&list=PLf-67McbxkT6iduMWoUsh3iHaZQl3jgvq&t=122s|here>.
It depends, and context is important. Do the homework to find the answer or convince employer to hire someone to give the answer. +1 : Lari Hotari ---- 2020-10-15 15:10:39 UTC - Mehran prs: Thanks ---- 2020-10-15 20:14:38 UTC - Marcio Martins: Is there an easy way to cleanup offloaded ledgers from deleted topics? I have 4TB of data from topics I just deleted lying on S3, because I don't know which ledgers belong to these deleted topics or topics still in use... ---- 2020-10-15 21:11:30 UTC - Evan Furman: I’m trying to determine what the common denominator is in terms of authentication for the brokers, zookeeper, and bookies — do all support pkcs12? I’d prefer to use a single method rather than multiples. It seems keystore would be the easiest way to achieve that as long as all support pkcs12 ---- 2020-10-15 21:32:36 UTC - Kenan Dalley: Is there a way to get a Reader to start reading messages from a specific published timestamp? Is there an api that can get a message at a published timestamp that would then allow a Reader to use its MessageId to read forward from there? I want to start in the middle of a topic, based on a certain time, and read forward from there rather than reading from Earliest because there could be millions to billions of records in between. ---- 2020-10-15 21:34:58 UTC - ckdarby: Pretty sure you just use the reader to pass in a messageId. Track the timestamp to messageId on your side somewhere ---- 2020-10-15 21:36:31 UTC - Kenan Dalley: Not really possible. I'm trying to look at data that's already in the topic and those MessageIds are not being captured. ---- 2020-10-15 21:37:20 UTC - ckdarby: Reseek the topic, and track the messageIds to timestamps and new messages coming in also track the messageId to the timestamp? ---- 2020-10-15 21:38:11 UTC - ckdarby: If you don't care about the timestamp and you have the message id you can just provide that to the reader ---- 2020-10-15 21:38:20 UTC - ckdarby: But if you want by specific timestamps you need to track those to message ids ---- 2020-10-15 21:39:27 UTC - Kenan Dalley: This is meant to be an adhoc utility and not something that runs 24x7. Also, we're not capturing the messageIds anywhere. This is something that I'd be able to do in Kafka (repositioning based on timestamp) and was looking to see if it was possible in Pulsar. I'm starting to think the answer is "No". ---- 2020-10-15 21:40:47 UTC - ckdarby: Do you know what KIP introduced that? ---- 2020-10-15 21:41:37 UTC - Kenan Dalley: It was introduced back in v1.0 or v1.1 Kafka. ---- 2020-10-15 21:42:55 UTC - ckdarby: I know around then they added timestamp as part of the actual message itself but I don't recall a seeking based off timestamp ---- 2020-10-15 21:47:11 UTC - Kenan Dalley: You can get the offsets per partition based on timestamp (Consumer.offsetsForTimes) and then reset based on the offsets. So, it's slightly indirect, but it's still possible. ---- 2020-10-15 21:54:39 UTC - ckdarby: For anyone else curious, this is where <https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index|Kafka implements it>. ---- 2020-10-15 21:55:10 UTC - ckdarby: @Kenan Dalley what is the use case out of curiosity? You just want to replay specific hours or something? ---- 2020-10-15 21:58:54 UTC - ckdarby: Usually I'd just use the PrestoSQL connector with Pulsar but you'll still scan and won't seek to a time based index like Kafka. ---- 2020-10-15 22:06:39 UTC - Addison Higham: @Kenan Dalley you can scroll based on time, one second ---- 2020-10-15 22:11:59 UTC - Addison Higham: <https://pulsar.apache.org/api/client/2.6.0-SNAPSHOT/org/apache/pulsar/client/api/ReaderBuilder.html#startMessageFromRollbackDuration-long-java.util.concurrent.TimeUnit-> <- when you builder a reader, you can pass in a time offset ---- 2020-10-15 22:12:43 UTC - Addison Higham: you do have to do the math based from the current time, but that will create you a new reader from that position ---- 2020-10-15 23:22:23 UTC - Curtis Cook: Has anyone connected pulsar with Snowflake? I know theres a Kafka connector, but wasn’t sure what the lift would be for directly piping Pulsar ---- 2020-10-15 23:50:25 UTC - Addison Higham: I don't imagine it would be difficult, but just a question of where that connector runs ---- 2020-10-16 01:47:53 UTC - Curtis Cook: I think they basically have a kafka listener, not 100% sure on this need to do more research ---- 2020-10-16 03:13:45 UTC - Rattanjot Singh: Can a closed ledger have a bookie id of a decommissioned bookie? ---- 2020-10-16 03:15:49 UTC - Addison Higham: I believe so, but if autorecovery is running it should only be temporary, as it will replicate the ledger to a new bookie then update the metadata ---- 2020-10-16 03:21:51 UTC - Rattanjot Singh: Is there where we can check if autorecovery is doing that? ---- 2020-10-16 03:26:13 UTC - Addison Higham: you can look at the autorecovery logs, you can also run the bookie tool for underreplicated ledgers ---- 2020-10-16 03:26:25 UTC - Addison Higham: also... did you figure out your TLS issue? apologies for losing track of that ---- 2020-10-16 03:27:32 UTC - Rattanjot Singh: yes! for tls. we have to give client zookeeper client tls settings in bkenv.sh like in pulsar_env.sh. ---- 2020-10-16 03:31:46 UTC - Addison Higham: interesting... ---- 2020-10-16 03:45:47 UTC - Rattanjot Singh: Ledger replicated successfully. ledger id is: 161 I see this log. But when i get the ledgermetadata i see it still has the ensemble of the decomissioned bookie ---- 2020-10-16 04:21:44 UTC - Addison Higham: `bookkeeper shell listunderreplicated` will tell you if you have any ledgers that are not fully replicated ---- 2020-10-16 04:59:55 UTC - Rattanjot Singh: It shows as empty. But I don't understand why does the ledger shows the decommissioned bookie in the ensemble ---- 2020-10-16 05:59:31 UTC - Sankararao Routhu: Hi! is there any way we can setup Zookeeper Ensemble across two data centers(regions) with out compromising on availability when one data center goes down ---- 2020-10-16 06:17:44 UTC - Rattanjot Singh: When I take down a bookie/decommission it the updated ensemble adds a bookie from the other region to the ensemble. ```org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie: 10.80.116.41:3181 in ensemble: [10.80.116.220:3181, 10.80.116.41:3181, 10.80.117.15:3181] is not adhering to placement policy and chose 10.80.124.22:3181. excludedBookies [10.80.116.41:3181] and quarantinedBookies []``` bookkeeperClientRegionawarePolicyEnabled=false bookkeeperClientRackawarePolicyEnabled=true ---- 2020-10-16 06:25:14 UTC - Sijie Guo: regular consumer will consume from brokers and brokers read data from tiered storage. ---- 2020-10-16 08:06:34 UTC - Johannes Wienke: @Johannes Wienke has joined the channel ---- 2020-10-16 08:09:53 UTC - Johannes Wienke: Hi, we're currently evaluating pulsar. Regarding the schema management: in case we use a topic to publish event data, ordering is important. Therefore I'd usually like to publish all of these events on a single topic. However, different event types for a single kind of resource might have different content, which makes designing a single Avro type quite hard. Is there a recommended approach how to handle this in Pulsar? I've only seen that the Confluent schema registry has dedicated support for such a case (<https://www.confluent.io/blog/put-several-event-types-kafka-topic/>). ----
