Slack digest for #general - 2020-10-16

Apache Pulsar Slack Fri, 16 Oct 2020 02:11:24 -0700

2020-10-15 09:11:33 UTC - xiaolong.ran: I send a issue 
<https://github.com/apache/pulsar/issues/8268> to track this.
+1 : Emil
----
2020-10-15 09:20:58 UTC - Rong: @Rong has joined the channel
----
2020-10-15 13:23:25 UTC - Sankararao Routhu: Hi! is there any way we can setup 
Zookeeper Ensemble across two data centers(regions)  which should sustain 
failure of one data center with synchronous replication enabled in pulsar
----
2020-10-15 14:00:59 UTC - Penghui Li: Could you please provide more details 
such as which subscription type are you using?
----
2020-10-15 14:11:50 UTC - Mehran prs: @Mehran prs has joined the channel
----
2020-10-15 14:29:15 UTC - Mehran prs: Hi everyone,
we are a small team (3 back-end developer), and our project is not big (15 
microservice), each microservice has about 30 topics, and our project's 
architecture is event-driven so we need to a streaming platform like pulsar or 
kafka.
I have a question: do pulsar is good for our project? is it not too big for us 
(cost and management on our k8s)?
----
2020-10-15 14:44:04 UTC - ckdarby: A video I did a while back about why we 
picked Pulsar instead of kafka, 
<https://www.youtube.com/watch?v=jLruEmh3ve0&amp;list=PLf-67McbxkT6iduMWoUsh3iHaZQl3jgvq&amp;t=122s|here>.

It depends, and context is important. Do the homework to find the answer or
convince employer to hire someone to give the answer.
+1 : Lari Hotari
----
2020-10-15 15:10:39 UTC - Mehran prs: Thanks
----
2020-10-15 20:14:38 UTC - Marcio Martins: Is there an easy way to cleanup
offloaded ledgers from deleted topics? I have 4TB of data from topics I just
deleted lying on S3, because I don't know which ledgers belong to these deleted
topics or topics still in use...
----
2020-10-15 21:11:30 UTC - Evan Furman: I’m trying to determine what the common
denominator is in terms of authentication for the brokers, zookeeper, and
bookies — do all support pkcs12? I’d prefer to use a single method rather than
multiples. It seems keystore would be the easiest way to achieve that as long
as all support pkcs12
----
2020-10-15 21:32:36 UTC - Kenan Dalley: Is there a way to get a Reader to start
reading messages from a specific published timestamp? Is there an api that can
get a message at a published timestamp that would then allow a Reader to use
its MessageId to read forward from there? I want to start in the middle of a
topic, based on a certain time, and read forward from there rather than reading
from Earliest because there could be millions to billions of records in between.
----
2020-10-15 21:34:58 UTC - ckdarby: Pretty sure you just use the reader to pass
in a messageId.

Track the timestamp to messageId on your side somewhere
----
2020-10-15 21:36:31 UTC - Kenan Dalley: Not really possible. I'm trying to
look at data that's already in the topic and those MessageIds are not being
captured.
----
2020-10-15 21:37:20 UTC - ckdarby: Reseek the topic, and track the messageIds
to timestamps and new messages coming in also track the messageId to the
timestamp?
----
2020-10-15 21:38:11 UTC - ckdarby: If you don't care about the timestamp and
you have the message id you can just provide that to the reader
----
2020-10-15 21:38:20 UTC - ckdarby: But if you want by specific timestamps you
need to track those to message ids
----
2020-10-15 21:39:27 UTC - Kenan Dalley: This is meant to be an adhoc utility
and not something that runs 24x7. Also, we're not capturing the messageIds
anywhere. This is something that I'd be able to do in Kafka (repositioning
based on timestamp) and was looking to see if it was possible in Pulsar. I'm
starting to think the answer is "No".
----
2020-10-15 21:40:47 UTC - ckdarby: Do you know what KIP introduced that?
----
2020-10-15 21:41:37 UTC - Kenan Dalley: It was introduced back in v1.0 or v1.1
Kafka.
----
2020-10-15 21:42:55 UTC - ckdarby: I know around then they added timestamp as
part of the actual message itself but I don't recall a seeking based off
timestamp
----
2020-10-15 21:47:11 UTC - Kenan Dalley: You can get the offsets per partition
based on timestamp (Consumer.offsetsForTimes) and then reset based on the
offsets. So, it's slightly indirect, but it's still possible.
----
2020-10-15 21:54:39 UTC - ckdarby: For anyone else curious, this is where
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index|Kafka
implements it>.
----
2020-10-15 21:55:10 UTC - ckdarby: @Kenan Dalley what is the use case out of
curiosity? You just want to replay specific hours or something?
----
2020-10-15 21:58:54 UTC - ckdarby: Usually I'd just use the PrestoSQL connector
with Pulsar but you'll still scan and won't seek to a time based index like
Kafka.
----
2020-10-15 22:06:39 UTC - Addison Higham: @Kenan Dalley you can scroll based on
time, one second
----
2020-10-15 22:11:59 UTC - Addison Higham:
<https://pulsar.apache.org/api/client/2.6.0-SNAPSHOT/org/apache/pulsar/client/api/ReaderBuilder.html#startMessageFromRollbackDuration-long-java.util.concurrent.TimeUnit->
&lt;- when you builder a reader, you can pass in a time offset
----
2020-10-15 22:12:43 UTC - Addison Higham: you do have to do the math based from
the current time, but that will create you a new reader from that position
----
2020-10-15 23:22:23 UTC - Curtis Cook: Has anyone connected pulsar with
Snowflake? I know theres a Kafka connector, but wasn’t sure what the lift
would be for directly piping Pulsar
----
2020-10-15 23:50:25 UTC - Addison Higham: I don't imagine it would be
difficult, but just a question of where that connector runs
----
2020-10-16 01:47:53 UTC - Curtis Cook: I think they basically have a kafka
listener, not 100% sure on this need to do more research
----
2020-10-16 03:13:45 UTC - Rattanjot Singh: Can a closed ledger have a bookie id
of a decommissioned bookie?
----
2020-10-16 03:15:49 UTC - Addison Higham: I believe so, but if autorecovery is
running it should only be temporary, as it will replicate the ledger to a new
bookie then update the metadata
----
2020-10-16 03:21:51 UTC - Rattanjot Singh: Is there where we can check if
autorecovery is doing that?
----
2020-10-16 03:26:13 UTC - Addison Higham: you can look at the autorecovery
logs, you can also run the bookie tool for underreplicated ledgers
----
2020-10-16 03:26:25 UTC - Addison Higham: also... did you figure out your TLS
issue? apologies for losing track of that
----
2020-10-16 03:27:32 UTC - Rattanjot Singh: yes! for tls. we have to give client
zookeeper client tls settings in bkenv.sh like in pulsar_env.sh.
----
2020-10-16 03:31:46 UTC - Addison Higham: interesting...
----
2020-10-16 03:45:47 UTC - Rattanjot Singh: Ledger replicated successfully.
ledger id is: 161

I see this log.
But when i get the ledgermetadata i see it still has the ensemble of the
decomissioned bookie
----
2020-10-16 04:21:44 UTC - Addison Higham: `bookkeeper shell
listunderreplicated` will tell you if you have any ledgers that are not fully
replicated
----
2020-10-16 04:59:55 UTC - Rattanjot Singh: It shows as empty. But I don't
understand why does the ledger shows the decommissioned bookie in the ensemble
----
2020-10-16 05:59:31 UTC - Sankararao Routhu: Hi! is there any way we can setup
Zookeeper Ensemble across two data centers(regions) with out compromising on
availability when one data center goes down
----
2020-10-16 06:17:44 UTC - Rattanjot Singh: When I take down a
bookie/decommission it the updated ensemble adds a bookie from the other region
to the ensemble.

```org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie:
10.80.116.41:3181 in ensemble: [10.80.116.220:3181, 10.80.116.41:3181,
10.80.117.15:3181] is not adhering to placement policy and chose
10.80.124.22:3181. excludedBookies [10.80.116.41:3181] and quarantinedBookies
[]```
bookkeeperClientRegionawarePolicyEnabled=false
bookkeeperClientRackawarePolicyEnabled=true
----
2020-10-16 06:25:14 UTC - Sijie Guo: regular consumer will consume from brokers
and brokers read data from tiered storage.
----
2020-10-16 08:06:34 UTC - Johannes Wienke: @Johannes Wienke has joined the
channel
----
2020-10-16 08:09:53 UTC - Johannes Wienke: Hi, we're currently evaluating
pulsar. Regarding the schema management: in case we use a topic to publish
event data, ordering is important. Therefore I'd usually like to publish all of
these events on a single topic. However, different event types for a single
kind of resource might have different content, which makes designing a single
Avro type quite hard. Is there a recommended approach how to handle this in
Pulsar? I've only seen that the Confluent schema registry has dedicated support
for such a case
(<https://www.confluent.io/blog/put-several-event-types-kafka-topic/>).
----

Slack digest for #general - 2020-10-16

Reply via email to