Slack digest for #general - 2019-10-04

Apache Pulsar Slack Fri, 04 Oct 2019 02:13:58 -0700

2019-10-03 12:02:10 UTC - Julien Lechalupé: Hello, i would like to know if 
Pulsar is able to support several millions of topics (3M to 4M in my use case) 
? I found this link: 
<https://github.com/apache/pulsar/wiki/PIP-8:-Pulsar-beyond-1M-topics>
Just wanted to know if this method is still up to date ?
----
2019-10-03 14:12:19 UTC - Raman Gupta: Would the best way to upgrade from a 
standalone Pulsar to a multi-node cluster be to configure synchronous 
geo-replication, wait for everything to be caught up, switch over the clients 
to the new cluster, and then finally shut down the standalone cluster?
----
2019-10-03 16:38:55 UTC - BigSam: @Ali Ahmed Thanks for your reply. The first 
try for the Java client on Android failed, seems the Java client is not pure 
Java. It depends on some native libraries that not compatible with Android. 
Will give a second try to look into the details.  
----
2019-10-03 16:43:34 UTC - Joshua Dunham: @Joshua Dunham has joined the channel
----
2019-10-03 16:44:50 UTC - Joshua Dunham: Hi Everyone
----
2019-10-03 16:45:28 UTC - Jon Bock: Welcome @Joshua Dunham!
----
2019-10-03 16:45:52 UTC - Joshua Dunham: I have a question on the schema 
registry component -- The docs say more info for integrations coming soon but 
I'm wondering if these include a hook into Apache Atlas.
----
2019-10-03 16:46:21 UTC - Raman Gupta: Its probbaly the native epoll stuff for 
Netty
----
2019-10-03 16:47:18 UTC - Ali Ahmed: @BigSam if you have a stacktrace please 
share it, I have not aware of required native dependencies. There should be a 
fallback
----
2019-10-03 16:48:31 UTC - Joshua Dunham: I have a standalone cluster running 
and it's been great so far. One concern is that many existing functions that 
the Apache foundation has 'solved' are being re-done in Pulsar. (Pulsar 
functions == Whisk, Pulsar Schema == Atlas).
----
2019-10-03 16:49:10 UTC - Matteo Merli: @BigSam there are few native library, 
in addition to netty, though these are all used in a way to fall back into pure 
java implementation
----
2019-10-03 16:51:51 UTC - Matteo Merli: The reasons for that was to have a 
self-contained and well integrated implementation.


For example, OpenWhisk would require a number of other supporting systems (from 
CouchBases on.. )

For schema registry, we wanted it to be an integral component of Pulsar, to 
enforce the schema at the broker level, and not introduce any other additional 
system dependencies.
----
2019-10-03 16:52:39 UTC - Matteo Merli: Barring that, we’re always open to work 
on integration / interoperability with these external systems
----
2019-10-03 17:19:40 UTC - Luke Lu: IMO, the broker should focus on pushing 
opaque messages efficiently and correctly (e.g. by taking advantage of an 
end-to-end message checksum/digest). We already had to deal with embedded 
websocket proxy causing production issues. Schema resolution could be better 
done in the client or a separate proxy.
----
2019-10-03 17:29:14 UTC - Matteo Merli: The broker is not doing ser/deser 
though, it will just validate the schema definition when a producer session is 
established. There’s no perf penalty in using the schema
----
2019-10-03 17:35:49 UTC - Devin G. Bost: What causes a `no space left on 
device` error in a function?
----
2019-10-03 17:38:55 UTC - Luke Lu: My point is that the more features you put 
in the broker, the more bugs that can potentially affect the rest of the broker 
via  unanticipated error paths. e.g. the pulsar client in the embedded 
websocket proxy has resource leaks on consumer subscription failures: 
<https://github.com/apache/pulsar/issues/5200>
----
2019-10-03 17:43:25 UTC - Joshua Dunham: I agree that bringing the logic / 
machinery to the data makes sense for performance. Atlas (for instance) is a 
very full featured registry that works with similar overlapping technologies 
already (avro for me).
----
2019-10-03 17:44:18 UTC - Joshua Dunham: From my Q above, Functions vs Schema 
Reg, schema reg is a good candidate to control externally and just sync changes 
back and forth. Performance should not be impacted that much.
----
2019-10-03 17:44:55 UTC - Joshua Dunham: Whisk is a different story though, I 
would not argue it's made to be able to keep pace with the volume of messages 
that Pulsar can ingest.
----
2019-10-03 17:45:30 UTC - Joshua Dunham: Having a connector for both would 
still be beneficial for the folks that could use the extra functionality both 
probive.
----
2019-10-03 17:45:33 UTC - Joshua Dunham: provide*
----
2019-10-03 17:46:35 UTC - Joshua Dunham: Both can integrate w/ Kafka currently 
and I remember there was a github issue about making an endpoint which spoke 
Kafka (and more) to aid adoption time.
----
2019-10-03 17:46:37 UTC - Joshua Dunham: Is this a thing?
----
2019-10-03 17:47:05 UTC - Joshua Dunham: Snapping in Pulsar to these systems 
would be game changing (for me at least).
----
2019-10-03 18:05:18 UTC - Poule: Talking about Whisk, I'd love to have an 
apigateway in Pulsar <https://github.com/apache/pulsar/issues/4249>
heart_eyes : Poule, Andrey Popelo
----
2019-10-03 18:27:01 UTC - Chris Bartholomew: @Raman Gupta I am not sure this is 
the best way, but if you used geo-replication for this, you would also need to 
use replicated subscriptions to synchronize the subscription state between the 
standalone and multi-node cluster. I would use async replication since it is 
easier to configure and probably good enough since you are coordinating the 
switchover.
----
2019-10-03 18:33:56 UTC - Anubhav Jain: @Anubhav Jain has joined the channel
----
2019-10-03 19:01:10 UTC - Raman Gupta: Is it ok for multiple `Consumer` 
instances in one process to have the same consumer name? I want the name to 
reflect the Kubernetes pod name., but I have multiple consumers in each 
container.
----
2019-10-03 19:02:01 UTC - Raman Gupta: Thanks @Chris Bartholomew. Do you have 
suggestions for better approaches?
----
2019-10-03 19:10:17 UTC - Jerry Peng: @Raman Gupta yes
ok_hand : Raman Gupta
----
2019-10-03 19:12:39 UTC - Chris Bartholomew: I think it might be easier to set 
up a single node cluster using a copy of the files from the standalone cluster 
and then expand from a single node to multi-node. This would require an outage 
on the standalone cluster while you are transferring its files to the 
single-node cluster.
+1 : Raman Gupta
----
2019-10-03 19:39:45 UTC - Addison Higham: am I recalling correctly that when 
using storage offloading, any segments moved to s3 won't be cleaned up even 
after retention passes?
----
2019-10-03 19:41:31 UTC - Addison Higham: along with that, from what I can't 
tell, there isn't an option to enable offloading for newly created namespaces 
by default. Seems like it should be straight forward to add
----
2019-10-03 20:19:56 UTC - V.V.S: @V.V.S has joined the channel
----
2019-10-03 20:22:17 UTC - V.V.S: Hi all, just a small query. Pulsar also, 
however, supports non-persistent topics, which are topics on which messages are 
never persisted to disk and live only in memory. When using non-persistent 
delivery, killing a Pulsar broker or disconnecting a subscriber to a topic 
means that all in-transit messages are lost on that (non-persistent) topic, 
meaning that clients may see message loss- to this statement is there a way i 
can transfer the data from one broker to other broker before i can gracefully 
shutdown one.?
----
2019-10-03 20:22:51 UTC - gangadhar.chinnireddy: @gangadhar.chinnireddy has 
joined the channel
----
2019-10-03 20:28:06 UTC - Matteo Merli: Not currently, that would be 
technically challenging to achieve and it won’t anyway be able to cover for 
brokers failures
----
2019-10-03 20:29:23 UTC - Oleg Kozlov: @Oleg Kozlov has joined the channel
----
2019-10-03 20:44:45 UTC - GC: @GC has joined the channel
----
2019-10-03 21:07:49 UTC - Oleg Kozlov: Hello all, sorry for double-posting with 
the dev-websocket channel... Quick question - is it possible to set deliverAt 
or deliverAfterSeconds configuration properties on messages from websocket 
producer ?
----
2019-10-03 21:08:29 UTC - Oleg Kozlov: basically - can I produce delayed / 
scheduled messages via WebSocket API?
----
2019-10-03 21:11:25 UTC - Matteo Merli: We haven’t exposed these settings yet 
outside the Java API
----
2019-10-03 21:12:00 UTC - Oleg Kozlov: are there plans to do that? And also, 
are they available  via protobuf ?
----
2019-10-03 21:12:38 UTC - Matteo Merli: Yes, it’s a simple additional property 
that has to be set in the message protobuf metadata.
----
2019-10-03 21:14:02 UTC - Oleg Kozlov: got it.. basically, we have an erlang 
app , and looking at Pulsar to use as a replacement for our current message 
broker, so the only two options for connecting erlang -&gt; pulsar are: 1) 
websockets api, 2) implement a client using protobuf
----
2019-10-03 21:14:05 UTC - Oleg Kozlov: is that correct?
----
2019-10-03 21:14:39 UTC - Matteo Merli: 3. wrap c++ client lib from erlang
----
2019-10-03 21:15:38 UTC - Oleg Kozlov: hm, ok, that's interesting, we'll look 
into that
----
2019-10-03 21:16:05 UTC - Oleg Kozlov: but so far websockets seems to be the 
easiest option.. would it be possible to add support for exposing deliverAt via 
WebSockets?
----
2019-10-03 21:16:27 UTC - Matteo Merli: yes, it’s very easy to add it
----
2019-10-03 21:17:04 UTC - Oleg Kozlov: seems like the change would be in 
org.apache.pulsar.websocket.ProducerHandler?
----
2019-10-03 21:17:30 UTC - Matteo Merli: correct
----
2019-10-03 21:18:24 UTC - Matteo Merli: and the docs are at: 
`site2/docs/client-libraries-websocket.md`
----
2019-10-03 21:19:37 UTC - Oleg Kozlov: got it, thank you :slightly_smiling_face:
----
2019-10-03 22:11:05 UTC - Luke Lu: It appears that much of the data plane work 
(esp. managedledger stuff) currently in pulsar broker can be delegated to 
DistributedLog: <https://bookkeeper.apache.org/distributedlog/> Can I assume 
that pulsar will eventually adopt the distributed log core api and essentially 
becomes read/write proxy of distributed log?
----
2019-10-04 02:14:07 UTC - Ali Ahmed: @Luke Lu no dlog is a legacy api
----
2019-10-04 02:24:05 UTC - Luke Lu: So the current dlog api is deprecated? Will 
ManagedLedger (appears already in bookkeeper package) and friends be absorbed 
into bookkeeper?
----
2019-10-04 02:30:09 UTC - Ali Ahmed: dog api can be considered deprecated, 
ManagedLedger will stay as is.
ok_hand : Luke Lu
----
2019-10-04 04:17:55 UTC - Matteo Merli: I wouldn’t say that. Managed ledger and 
DLog are 2 libraries that were created for the same purpose and have a very big 
overlap in functionalities, though there are few differences. The differences 
are not big, but still require careful thinking to be able to syntetize them 
into a single API that could support systems using the 2 libraries.

Some time back, we had thought of merging the 2 libraries into 1 which would 
have a superset of the features. The main challenges for that are:
 1. Time. It would be quite a huge task to complete
 2. Ensure metadata compatibility and path for live migrations
 3. Opportunity cost. We decided, for now, to use that time to build 
features/improvements/etc.. that are more directly useful to users.
----
2019-10-04 04:42:04 UTC - Luke Lu: Thanks for the pragmatic and historical 
perspectives! Make sense.
----
2019-10-04 04:46:16 UTC - Luke Lu: It’s a pity that much of the logic is 
duplicated…
----
2019-10-04 04:51:59 UTC - Matteo Merli: Yes, the reason is that the 2 libs were 
created in parallel as closed source at Yahoo and Twitter
----

Slack digest for #general - 2019-10-04

Reply via email to