2019-02-19 10:41:22 UTC - Sébastien de Melo: @Sébastien de Melo has joined the channel ---- 2019-02-19 10:56:59 UTC - Maarten Tielemans: Hi guys, reading some of the docs on Pulsar functions. Cool feature and good work on the feature/docs so far! One thing which isn't 100% clear to me, is it possible to run Pulsar functions with parallelism 1 and have a failover? A guarantee of exactly once ordered handling ---- 2019-02-19 11:01:53 UTC - Jacob O'Farrell: @Ali Ahmed You were right - it was having issues connecting to the broker http endpoint. It seems this isn't enabled by default, but Pulsar SQL requires this ---- 2019-02-19 11:34:05 UTC - Sijie Guo: > One thing which isn’t 100% clear to me, is it possible to run Pulsar functions with parallelism 1 and have a failover?
I am not sure if I udnerstand you question clearly. but I am trying to answer from two points: 1) you are able to run parallelism of 1. pulsar functions runtime will handle failover for you. if the machines running the function crashes. 2) you can also scheduling parallelism of 2 and using failover subscription. that means you have 2 instances, one is actually invoking the function, while the other one is standing by. if the first one crashes, the standby one will take over the consumption. at any cases, pulsar supports exactly one on idempontent function. hope this answers your question. ---- 2019-02-19 12:51:26 UTC - lingchen: @lingchen has joined the channel ---- 2019-02-19 13:21:59 UTC - Christophe Bornet: Do you think it would be possible to provide region-aware read-only brokers so that consumers from a region can read on the same region ? Somehing like this KIP : <https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica> for Kafka ---- 2019-02-19 13:35:56 UTC - Laurent Chriqui: Hi, When I try to compile the c++ client on mac os 10.14.3, I get the following error when running the final make command. ---- 2019-02-19 13:59:32 UTC - Marc Le Labourier: Same ---- 2019-02-19 14:55:04 UTC - Sébastien de Melo: Me too ---- 2019-02-19 15:02:59 UTC - Jacob O'Farrell: Sorry for the onslaught of questions - I'm currently trying to query my data via PulsarSQL, however I'm running into difficulty with the Schema(s). I've managed to upload the schema successfully (I will do my best to contribute some docs that help people on their way here), however when I run the presto query, I am getting `Topic {TOPIC REMOVED BY ME} does not have a valid schema` Whilst I understand that this is most likely due to user error, is there any way I can get more insight into this error/where it has gone wrong? Any help in guiding me in the right direction would be appreciated ---- 2019-02-19 15:39:28 UTC - Matteo Merli: DdI you install boost-python? ---- 2019-02-19 15:40:25 UTC - Matteo Merli: Actually, the easiest way to get the C++ lib on Mac is to just do `brew install libpulsar` ---- 2019-02-19 15:40:38 UTC - Matteo Merli: No need to compile it ---- 2019-02-19 15:44:51 UTC - Laurent Chriqui: I installed boost-python3 ---- 2019-02-19 15:45:14 UTC - Laurent Chriqui: @Matteo Merli I wanted to compile the unreleased version 2.3.0 ---- 2019-02-19 16:05:22 UTC - Jacob O'Farrell: If anyone has any example data with matching schemas that would be super helpful! Should greatly help me wrap my head around the formatting it expects. My current understanding is that Pulasr is expecting the schema to be found within the double quotes on L3 of this example file <https://github.com/apache/pulsar/blob/master/conf/schema_example.conf#L3> ---- 2019-02-19 16:19:54 UTC - Jacob O'Farrell: FWIW I think I've greatly confused myself by trying to deserialise the `schema` from the admin cli for the generator_test topic that you create in one of the guides ---- 2019-02-19 16:27:30 UTC - David Kjerrumgaard: @Christophe Bornet The typical deployment configuration for geo-replication within Pulsar relies upon setting up two separate BookKeeper clusters (one in each region). Because of this, the Pulsar brokers will always read from the Pulsar broker in the same region. Is that what you are looking to achieve? ---- 2019-02-19 16:35:08 UTC - Christophe Bornet: Not really. The multi-cluster geo-replication has some drawbacks : it is async so there's no delivery guarantee and subscriptions are local to the clusters. What I am looking for is mainly "global" subscriptions which are natural with a single cluster spread over multiple regions but has the drawback of a high utilization of the expensive inter-region network. So to limit this use of the network, I was wondering if it could be possible to read from "read-only" brokers on the same region as the client. ---- 2019-02-19 16:48:28 UTC - Christophe Bornet: We have done something working which is using namespace-isolation and region-aware placement policy on a single spread cluster to ensure local consumption of clients (clients of DC1 read/write on brokers of DC1, brokers of DC1 read on priority on bookies of DC1). Consumers of DC1 only get messages from producers of DC1 (It's kind of active/passive). But for other use-cases we'd like to have active/active where consumers from DC1 get messages produced by DC2 ---- 2019-02-19 16:48:41 UTC - Dmitry Sh: @Dmitry Sh has joined the channel ---- 2019-02-19 16:49:19 UTC - Christophe Bornet: Do you think the replication can be configured from a namespace of a cluster to another namespace of the same cluster ? I guess this would solve the issue. ---- 2019-02-19 16:56:42 UTC - David Kjerrumgaard: Thanks for the explanation of what you want to achieve. It seems like you have made some progress on your own in this area. As for the namespace replication with the same cluster, that sounds plausible, but would require some additional coding. Does it have to be at the namespace level or can it be per-topic to start? I think it would be safe to start small and build up. ---- 2019-02-19 17:11:39 UTC - Christophe Bornet: Well, we have worked a lot on this subject indeed. What we are trying to achieve is active/active cluster with possibility of failing over the consumers to the other region. Currently that's not possible because subscriptions are not replicated. So we are looking for other solutions. Note that Kafka has the same problem so there is a clear competitive advantage in solving this issue that many are facing. What would be your recommendation ? Are you planning to solve this issue (eg. with replication of subscriptions in geo-replication) ? ---- 2019-02-19 17:15:18 UTC - Matteo Merli: Oh, got it. Do you also have python 3 installed on macOS? ---- 2019-02-19 17:15:49 UTC - Matteo Merli: Seems like it might picking py2 and boost-python3 ---- 2019-02-19 17:25:30 UTC - David Kjerrumgaard: The ability to have an active/active deployment for Pulsar is definitely something we are looking to achieve, as it would provide us a clear advantage over Kafka. We would also be happy to accept any contributions on this area, including a PIP detailing the design and features you are looking for. ---- 2019-02-19 17:43:38 UTC - Matteo Merli: Uhm, ok I’m getting the same error now. Could be related to that I just updated the Xcode version and might have pulled in new version of clang ---- 2019-02-19 17:44:03 UTC - Matteo Merli: Seems python is not being passed in the linking pahase ---- 2019-02-19 17:45:45 UTC - Matteo Merli: Which, I believe it was not required before, since the _pulsar.so is a plugin to be loaded from within python itself. ---- 2019-02-19 17:45:57 UTC - Matteo Merli: Anyway, working on a fix ---- 2019-02-19 17:47:11 UTC - Matteo Merli: In meantime, if you don’t care for python wrapper, you can do `cmake -DBUILD_PYTHON_WRAPPER=OFF` ---- 2019-02-19 17:58:20 UTC - Matteo Merli: Created <https://github.com/apache/pulsar/pull/3626> ---- 2019-02-19 18:13:12 UTC - Joe Francis: An Active-Active cluster + global subscription is hard to achieve, because there are no ordering guarantees on writes across clusters. To replicate subscription there needs to be a stream cursor that will apply on both clusters, which means such a sub has to track the write streams from each cluster individually. ---- 2019-02-19 18:22:36 UTC - Laurent Chriqui: Thank you! Yes I have both python3 and python2. ---- 2019-02-19 18:25:42 UTC - David Kjerrumgaard: That would require an approach similar to what is outlined in the paper. <http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf> ---- 2019-02-19 18:26:15 UTC - David Kjerrumgaard: doable, but definitely a lot of effort. ---- 2019-02-19 18:34:15 UTC - Marc Le Labourier: cmake -DBUILD_PYTHON_WRAPPER=OFF . Seems to be working on my side. Thanks for the temporary fix and the PR. ---- 2019-02-19 19:38:13 UTC - Facundo Rodriguez: Hello everyone! I'm using Pulsar as pub/sub buy I'm getting 500 I don't know whats going on. This is the trace ---- 2019-02-19 22:55:47 UTC - Yadi Yang: @Yadi Yang has joined the channel ---- 2019-02-19 23:10:55 UTC - Yadi Yang: I got this error when I tired to install c++ library on `debian:stretch` in docker image following the readme in pulsar c++ repo. ```$ cd /usr/src/gmock/ $ cmake . CMake Error at /usr/src/googletest/CMakeLists.txt:13 (add_subdirectory): add_subdirectory not given a binary directory but the given source directory "/usr/src/gmock" is not a subdirectory of "/usr/src/googletest". When specifying an out-of-tree source a binary directory must be explicitly specified. CMake Error at CMakeLists.txt:56 (config_compiler_and_linker): Unknown CMake command "config_compiler_and_linker". -- Configuring incomplete, errors occurred! See also "/usr/src/gmock/CMakeFiles/CMakeOutput.log".``` Anyone has any thoughts? Thanks in advance:sweat_smile: ---- 2019-02-19 23:11:16 UTC - Yadi Yang: I also tried this but no luck. <https://github.com/apache/pulsar/pull/1957> ---- 2019-02-19 23:40:40 UTC - Matteo Merli: @Yadi Yang You can also try to use the Deb packages already pre-built at <https://pulsar.apache.org/docs/en/client-libraries-cpp/#deb> Alternative, you can build it by skipping the test code with `cmake -DBUILD_TESTS=OFF . ` ---- 2019-02-20 01:24:55 UTC - Sijie Guo: Can you explain when you encountered this issue? ---- 2019-02-20 03:03:52 UTC - Vincent Ngan: I think I have asked something similar before but I really want to make sure I understand it correctly: If I set both a namespace’s retention size and time to -1, is it true that all the messages in that namespace will never be removed from the system regardless of whether they have been acknowledged or not? The reason I ask this is that I want to use Pulsar as a persistence mechanism for an in-memory database solution. I need to fully understand the persistence and durability behaviours of Pulsar messages. ---- 2019-02-20 03:04:14 UTC - Matteo Merli: That is correct ---- 2019-02-20 03:05:10 UTC - Jianfeng Qiao: @Jianfeng Qiao has joined the channel ---- 2019-02-20 03:07:13 UTC - Vincent Ngan: So, supposing I have sent a lot of messages to a topic and they have all been acknowledged by a consumer with a subscription. Then a year later, I can create another consumer with a different subscription to read all the messages back. ---- 2019-02-20 03:08:10 UTC - Matteo Merli: Yes, provided you have enough disk space :slightly_smiling_face: or that you use tiered storage to offload to cloud storage ---- 2019-02-20 03:08:56 UTC - Vincent Ngan: Yes, I will make sure I have enough disk space. ---- 2019-02-20 03:09:53 UTC - Matteo Merli: Also, you can keep adding BookKeeper storage nodes dynamically, without any rebalancing of data ---- 2019-02-20 03:10:31 UTC - Vincent Ngan: Great! Thanks for your answer. ---- 2019-02-20 09:01:09 UTC - bossbaby: Hello all, Why did I delete all topics but the bookie still reports: ``` 09:00:22.191 [LedgerDirsMonitorThread] WARN org.apache.bookkeeper.util.DiskChecker - Space left on device data/bookkeeper/ledgers/current : 1853960192, Used space fraction: 0.9282315 < WarnThreshold 0.95. ``` ----
