Slack digest for #general - 2019-02-20

Apache Pulsar Slack Wed, 20 Feb 2019 01:11:52 -0800

2019-02-19 10:41:22 UTC - Sébastien de Melo: @Sébastien de Melo has joined the 
channel
----
2019-02-19 10:56:59 UTC - Maarten Tielemans: Hi guys, reading some of the docs 
on Pulsar functions. Cool feature and good work on the feature/docs so far!
One thing which isn't 100% clear to me, is it possible to run Pulsar functions 
with parallelism 1 and have a failover? A guarantee of exactly once ordered 
handling
----
2019-02-19 11:01:53 UTC - Jacob O'Farrell: @Ali Ahmed You were right - it was 
having issues connecting to the broker http endpoint. It seems this isn't 
enabled by default, but Pulsar SQL requires this
----
2019-02-19 11:34:05 UTC - Sijie Guo: &gt; One thing which isn’t 100% clear to 
me, is it possible to run Pulsar functions with parallelism 1 and have a 
failover?

I am not sure if I udnerstand you question clearly. but I am trying to answer
from two points:

1) you are able to run parallelism of 1. pulsar functions runtime will handle
failover for you. if the machines running the function crashes.
2) you can also scheduling parallelism of 2 and using failover subscription.
that means you have 2 instances, one is actually invoking the function, while
the other one is standing by. if the first one crashes, the standby one will
take over the consumption.

at any cases, pulsar supports exactly one on idempontent function.

hope this answers your question.
----
2019-02-19 12:51:26 UTC - lingchen: @lingchen has joined the channel
----
2019-02-19 13:21:59 UTC - Christophe Bornet: Do you think it would be possible
to provide region-aware read-only brokers so that consumers from a region can
read on the same region ? Somehing like this KIP :
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica>
for Kafka
----
2019-02-19 13:35:56 UTC - Laurent Chriqui: Hi,
When I try to compile the c++ client on mac os 10.14.3, I get the following
error when running the final make command.
----
2019-02-19 13:59:32 UTC - Marc Le Labourier: Same
----
2019-02-19 14:55:04 UTC - Sébastien de Melo: Me too
----
2019-02-19 15:02:59 UTC - Jacob O'Farrell: Sorry for the onslaught of questions
- I'm currently trying to query my data via PulsarSQL, however I'm running into
difficulty with the Schema(s). I've managed to upload the schema successfully
(I will do my best to contribute some docs that help people on their way here),
however when I run the presto query, I am getting `Topic {TOPIC REMOVED BY ME}
does not have a valid schema`

Whilst I understand that this is most likely due to user error, is there any
way I can get more insight into this error/where it has gone wrong? Any help in
guiding me in the right direction would be appreciated
----
2019-02-19 15:39:28 UTC - Matteo Merli: DdI you install boost-python?
----
2019-02-19 15:40:25 UTC - Matteo Merli: Actually, the easiest way to get the
C++ lib on Mac is to just do `brew install libpulsar`
----
2019-02-19 15:40:38 UTC - Matteo Merli: No need to compile it
----
2019-02-19 15:44:51 UTC - Laurent Chriqui: I installed boost-python3
----
2019-02-19 15:45:14 UTC - Laurent Chriqui: @Matteo Merli I wanted to compile
the unreleased version 2.3.0
----
2019-02-19 16:05:22 UTC - Jacob O'Farrell: If anyone has any example data with
matching schemas that would be super helpful! Should greatly help me wrap my
head around the formatting it expects.
My current understanding is that Pulasr is expecting the schema to be found
within the double quotes on L3 of this example file
<https://github.com/apache/pulsar/blob/master/conf/schema_example.conf#L3>
----
2019-02-19 16:19:54 UTC - Jacob O'Farrell: FWIW I think I've greatly confused
myself by trying to deserialise the `schema` from the admin cli for the
generator_test topic that you create in one of the guides
----
2019-02-19 16:27:30 UTC - David Kjerrumgaard: @Christophe Bornet The typical
deployment configuration for geo-replication within Pulsar relies upon setting
up two separate BookKeeper clusters (one in each region). Because of this, the
Pulsar brokers will always read from the Pulsar broker in the same region. Is
that what you are looking to achieve?
----
2019-02-19 16:35:08 UTC - Christophe Bornet: Not really. The multi-cluster
geo-replication has some drawbacks : it is async so there's no delivery
guarantee and subscriptions are local to the clusters. What I am looking for is
mainly "global" subscriptions which are natural with a single cluster spread
over multiple regions but has the drawback of a high utilization of the
expensive inter-region network. So to limit this use of the network, I was
wondering if it could be possible to read from "read-only" brokers on the same
region as the client.
----
2019-02-19 16:48:28 UTC - Christophe Bornet: We have done something working
which is using namespace-isolation and region-aware placement policy on a
single spread cluster to ensure local consumption of clients (clients of DC1
read/write on brokers of DC1, brokers of DC1 read on priority on bookies of
DC1). Consumers of DC1 only get messages from producers of DC1 (It's kind of
active/passive). But for other use-cases we'd like to have active/active where
consumers from DC1 get messages produced by DC2
----
2019-02-19 16:48:41 UTC - Dmitry Sh: @Dmitry Sh has joined the channel
----
2019-02-19 16:49:19 UTC - Christophe Bornet: Do you think the replication can
be configured from a namespace of a cluster to another namespace of the same
cluster ? I guess this would solve the issue.
----
2019-02-19 16:56:42 UTC - David Kjerrumgaard: Thanks for the explanation of
what you want to achieve. It seems like you have made some progress on your own
in this area. As for the namespace replication with the same cluster, that
sounds plausible, but would require some additional coding. Does it have to be
at the namespace level or can it be per-topic to start? I think it would be
safe to start small and build up.
----
2019-02-19 17:11:39 UTC - Christophe Bornet: Well, we have worked a lot on this
subject indeed. What we are trying to achieve is active/active cluster with
possibility of failing over the consumers to the other region. Currently that's
not possible because subscriptions are not replicated. So we are looking for
other solutions. Note that Kafka has the same problem so there is a clear
competitive advantage in solving this issue that many are facing. What would be
your recommendation ? Are you planning to solve this issue (eg. with
replication of subscriptions in geo-replication) ?
----
2019-02-19 17:15:18 UTC - Matteo Merli: Oh, got it. Do you also have python 3
installed on macOS?
----
2019-02-19 17:15:49 UTC - Matteo Merli: Seems like it might picking py2 and
boost-python3
----
2019-02-19 17:25:30 UTC - David Kjerrumgaard: The ability to have an
active/active deployment for Pulsar is definitely something we are looking to
achieve, as it would provide us a clear advantage over Kafka. We would also be
happy to accept any contributions on this area, including a PIP detailing the
design and features you are looking for.
----
2019-02-19 17:43:38 UTC - Matteo Merli: Uhm, ok I’m getting the same error now.
Could be related to that I just updated the Xcode version and might have pulled
in new version of clang
----
2019-02-19 17:44:03 UTC - Matteo Merli: Seems python is not being passed in the
linking pahase
----
2019-02-19 17:45:45 UTC - Matteo Merli: Which, I believe it was not required
before, since the _pulsar.so is a plugin to be loaded from within python
itself.
----
2019-02-19 17:45:57 UTC - Matteo Merli: Anyway, working on a fix
----
2019-02-19 17:47:11 UTC - Matteo Merli: In meantime, if you don’t care for
python wrapper, you can do `cmake -DBUILD_PYTHON_WRAPPER=OFF`
----
2019-02-19 17:58:20 UTC - Matteo Merli: Created
<https://github.com/apache/pulsar/pull/3626>
----
2019-02-19 18:13:12 UTC - Joe Francis: An Active-Active cluster + global
subscription is hard to achieve, because there are no ordering guarantees on
writes across clusters. To replicate subscription there needs to be a stream
cursor that will apply on both clusters, which means such a sub has to track
the write streams from each cluster individually.
----
2019-02-19 18:22:36 UTC - Laurent Chriqui: Thank you! Yes I have both python3
and python2.
----
2019-02-19 18:25:42 UTC - David Kjerrumgaard: That would require an approach
similar to what is outlined in the paper.
<http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf>
----
2019-02-19 18:26:15 UTC - David Kjerrumgaard: doable, but definitely a lot of
effort.
----
2019-02-19 18:34:15 UTC - Marc Le Labourier: cmake -DBUILD_PYTHON_WRAPPER=OFF .
Seems to be working on my side.
Thanks for the temporary fix and the PR.
----
2019-02-19 19:38:13 UTC - Facundo Rodriguez: Hello everyone! I'm using Pulsar
as pub/sub buy I'm getting 500 I don't know whats going on. This is the trace
----
2019-02-19 22:55:47 UTC - Yadi Yang: @Yadi Yang has joined the channel
----
2019-02-19 23:10:55 UTC - Yadi Yang: I got this error when I tired to install
c++ library on `debian:stretch` in docker image following the readme in pulsar
c++ repo.
```$ cd /usr/src/gmock/
$ cmake .
CMake Error at /usr/src/googletest/CMakeLists.txt:13 (add_subdirectory):
add_subdirectory not given a binary directory but the given source
directory "/usr/src/gmock" is not a subdirectory of "/usr/src/googletest".
When specifying an out-of-tree source a binary directory must be explicitly
specified.

CMake Error at CMakeLists.txt:56 (config_compiler_and_linker):
Unknown CMake command "config_compiler_and_linker".

-- Configuring incomplete, errors occurred!
See also "/usr/src/gmock/CMakeFiles/CMakeOutput.log".```
Anyone has any thoughts? Thanks in advance:sweat_smile:
----
2019-02-19 23:11:16 UTC - Yadi Yang: I also tried this but no luck.
<https://github.com/apache/pulsar/pull/1957>
----
2019-02-19 23:40:40 UTC - Matteo Merli: @Yadi Yang You can also try to use the
Deb packages already pre-built at
<https://pulsar.apache.org/docs/en/client-libraries-cpp/#deb>

Alternative, you can build it by skipping the test code with `cmake
-DBUILD_TESTS=OFF . `
----
2019-02-20 01:24:55 UTC - Sijie Guo: Can you explain when you encountered this
issue?
----
2019-02-20 03:03:52 UTC - Vincent Ngan: I think I have asked something similar
before but I really want to make sure I understand it correctly: If I set both
a namespace’s retention size and time to -1, is it true that all the messages
in that namespace will never be removed from the system regardless of whether
they have been acknowledged or not? The reason I ask this is that I want to use
Pulsar as a persistence mechanism for an in-memory database solution. I need to
fully understand the persistence and durability behaviours of Pulsar messages.
----
2019-02-20 03:04:14 UTC - Matteo Merli: That is correct
----
2019-02-20 03:05:10 UTC - Jianfeng Qiao: @Jianfeng Qiao has joined the channel
----
2019-02-20 03:07:13 UTC - Vincent Ngan: So, supposing I have sent a lot of
messages to a topic and they have all been acknowledged by a consumer with a
subscription. Then a year later, I can create another consumer with a different
subscription to read all the messages back.
----
2019-02-20 03:08:10 UTC - Matteo Merli: Yes, provided you have enough disk
space :slightly_smiling_face: or that you use tiered storage to offload to
cloud storage
----
2019-02-20 03:08:56 UTC - Vincent Ngan: Yes, I will make sure I have enough
disk space.
----
2019-02-20 03:09:53 UTC - Matteo Merli: Also, you can keep adding BookKeeper
storage nodes dynamically, without any rebalancing of data
----
2019-02-20 03:10:31 UTC - Vincent Ngan: Great! Thanks for your answer.
----
2019-02-20 09:01:09 UTC - bossbaby: Hello all, Why did I delete all topics but
the bookie still reports:
```
09:00:22.191 [LedgerDirsMonitorThread] WARN
org.apache.bookkeeper.util.DiskChecker - Space left on device
data/bookkeeper/ledgers/current : 1853960192, Used space fraction: 0.9282315
&lt; WarnThreshold 0.95.
```
----

Slack digest for #general - 2019-02-20

Reply via email to