Slack digest for #general - 2020-04-16

Apache Pulsar Slack Thu, 16 Apr 2020 02:12:03 -0700

2020-04-15 09:29:16 UTC - Frederic Tausch: @Frederic Tausch has joined the 
channel
----
2020-04-15 09:32:49 UTC - Maxi: @Maxi has joined the channel
----
2020-04-15 10:29:12 UTC - Hiroyuki Yamada: @Sijie Guo I checked the ICDE paper 
breifly and got the 2nd question. I’m still not sure about the 1st one. (sorry 
for pushing)
• Any benefit for having E &gt; Qw ?
• E=2, Qw=2, Qa=2 is a good configuration for production ? 
• (related to the above) The prod-ready helm chart is using E=3, Qw=3, Qa=2 
with 3 brokers, so it can’t tolerate even 1 node failure but is is OK ?
----
2020-04-15 10:56:16 UTC - Tom Weinberg: @Tom Weinberg has joined the channel
----
2020-04-15 12:13:38 UTC - Ebere Abanonu: @Sijie Guo I have a Schema with Array 
field but looking at presto's stats all other fields are listed except the 
array field. Is array supported in pulsar connector? Additionally, the  topic 
is created but using admin-pulsar to list topics it does not get listed
----
2020-04-15 12:50:56 UTC - Penghui Li: Array field and map field are not support 
now. If you have not create any subscription on the topic, the topic  will be 
automatically deleted if no active producers. You can set retention policy for 
the namespace or disable the inactive topic auto deletion.
----
2020-04-15 13:35:15 UTC - Aravindhan: <#C5Z4T36F7|general> I am using 
Kubernetes deployment and Running source connector in the broker. In order to 
run the source connector, I am copying the required files into the container 
every time and running the source connector create command. Is this the 
recommended approach or the pulsar image can be rebuilt to have the required 
files?
----
2020-04-15 15:33:27 UTC - Rahul: Does pulsar support refresh of auth token 
without impacting producing/consuming?
As per my understanding, pulsar only validats the token during lookup. Once the 
required broker is found after that it doesn't do anymore validation. Please 
correct me if I'm wrong
----
2020-04-15 15:45:22 UTC - Matteo Merli: Yes, it's added in master (for 2.6)
----
2020-04-15 15:45:57 UTC - Matteo Merli: If the tokens have a TTL set, the 
broker will force the client to re-validate the connection once the TTL expires
----
2020-04-15 15:47:08 UTC - Raman Gupta: That's -^ my use case as well, and I 
suspect the use case of most people looking for this capability.
+1 : Xavier Levaux
----
2020-04-15 15:59:41 UTC - Rahul: @Matteo Merli Is the TTL method available in 
2.5.0?
----
2020-04-15 16:00:36 UTC - Matteo Merli: No, in 2.5.0 is only validated at the 
beginning of the session. When the client is connected, then it will stay 
connected
----
2020-04-15 16:01:03 UTC - Rahul: @Matteo Merli Thanks for the info
----
2020-04-15 16:57:50 UTC - JG: Is it possible to make reactive programming with 
Puslar ? With frameworks such as Vertx and RxJava ???
----
2020-04-15 17:22:06 UTC - Evan Furman: Just checking back in here. Let us know 
when we’re ready to test the new PR. Thanks!
----
2020-04-15 18:16:03 UTC - David Kjerrumgaard: Pulsar Functions process events 
immediately upon receiving them from a Pulsar topic. Would you classify that as 
reactive ?
----
2020-04-15 20:51:27 UTC - rwaweber: Hey all! Deployment-centric question, is 
there any reason there is mention of _small_ disks here wrt bookkeeper? Are 
larger disks discouraged?


<https://pulsar.apache.org/docs/en/deploy-bare-metal/#bookies-and-brokers>
----
2020-04-15 20:51:53 UTC - rwaweber: Also, in that same document, it appears 
that the bookies are running on the same host as the pulsar broker. Is that a 
common pattern? I feel like over time, it would becomes difficult to scale out 
a cluster configured like this, right?
----
2020-04-15 20:59:04 UTC - David Kjerrumgaard: @rwaweber Brokers and bookies 
were designed to be decoupled and should run on different machines if you plan 
on scaling out the cluster. For smaller "dev" type clusters they are typically 
co-located primarily because it is hard to requisition 6+ machines for a small 
POC type project. HTH.
+1 : rwaweber
thanks : rwaweber
----
2020-04-15 21:01:24 UTC - rwaweber: Thanks David! That certainly helped!
----
2020-04-15 21:02:17 UTC - David Kjerrumgaard: • @rwaweber I believe you are 
referring to the phrase "Small and fast 
<https://en.wikipedia.org/wiki/Solid-state_drive|solid-state drives> (SSDs) or 
<https://en.wikipedia.org/wiki/Hard_disk_drive|hard disk drives> (HDDs) with a 
<https://en.wikipedia.org/wiki/RAID|RAID> controller and a battery-backed write 
cache (for BookKeeper bookies)"  I believe that this statement merely 
acknowledges the fact that SSDs are inherently smaller than traditional HDDs. 
The larger point being that the disks you choose should be resilient to 
failures at the hardware level.
+1 : rwaweber
----
2020-04-15 21:03:44 UTC - David Kjerrumgaard: That being said, to answer your 
inital question. Larger disks are not discouraged, but slow ones are.
cool : rwaweber
----
2020-04-15 21:05:21 UTC - rwaweber: Ahhhh gotcha, so the distinction isn’t made 
between SSDs or running spinning drives with RAID.

Both ought to be resilient to failures at the hardware level.

E.g. SSDs _with_ RAID or spinning drives _with_ RAID

Makes sense!
+1 : David Kjerrumgaard
----
2020-04-15 22:54:25 UTC - JG: Hey guys, anyone got an idea how to do event 
streaming/sourcing with Pulsar ? What is the topic strategy and how to replay 
events ?
----
2020-04-15 23:01:36 UTC - David Kjerrumgaard: @JG Could you elaborate a bit on 
the use case or what problem you are trying to solve with this approach?
----
2020-04-16 00:15:14 UTC - JG: Before I had a Postgresql as event storage and 
MongoDB as query database ( CQRS system ), I could replay all events to 
"rebuild" the query databases
----
2020-04-16 00:15:58 UTC - JG: I would like to remove my postgresql event 
storage and use Pulsar book keeper as event storage ?
----
2020-04-16 00:23:47 UTC - JG: I suppose I should use Pulsar I/O MongoDB 
connector for sink ?
----
2020-04-16 00:39:56 UTC - Sijie Guo: Are you going to do point lookup in the 
event storage?
----
2020-04-16 01:30:59 UTC - Tolulope Awode: Hi @Sijie Guo
----
2020-04-16 01:31:08 UTC - Tolulope Awode: Thanks for the other time
----
2020-04-16 01:44:46 UTC - Penghui Li: 
<https://github.com/apache/pulsar/pull/6647> is merged and 2.5.1 includes this 
PR. If you want to test it earlier, you can build the master branch or you can 
wait for 2.5.1 release(also 2.5.1 RC is out, you can download the RC version to 
verify).

Note: If test with non-batch message, it's better to increase the 
`dispatcherMaxReadBatchSize` in the broker.conf
+1 : Hiroyuki Yamada
----
2020-04-16 01:53:03 UTC - Jared Mackey: Using a non-persistent topic, how do I 
ack messages? The message IDs are the default values of 0. 
----
2020-04-16 01:54:50 UTC - Jared Mackey: For context: I have been writing my own 
client library and referencing the go client lib. I don’t see them handling non 
persistent messages differently. 
----
2020-04-16 02:01:46 UTC - Jared Mackey: Since I’m not able to ack, after I have 
hit my flow # I stop receiving new messages. 
----
2020-04-16 02:53:50 UTC - Jared Mackey: Ok, so it looks like the Go client is 
dynamically calling flow based on some work queue depth. Our client has only 
been calling it at startup and was expecting pulsar to provide the back 
pressure when our workers get tied up. Looks like this style isn’t compatible 
with non persistent topics since the acks do not work.
----
2020-04-16 04:06:42 UTC - Jared Mackey: I see the err of my ways here, looks 
like I need to continously flow in either case.
----
2020-04-16 05:49:16 UTC - Adelina Brask: @Adelina Brask has joined the channel
----
2020-04-16 06:21:29 UTC - Adelina Brask: Hi guys, I have an issue I can't seem 
to fix, and I need all the help possible. I can't seem to deploy both a source 
(netty or generator) and a sink (elastic) at the same time in cluster mode. I 
have 3 brokers + bookie and a 3 nodes cluster of zookeeper. What I have tried 
is:
1. running both Netty/Generator and Elastic in localrun - Success
2. running only sink/source in clustermode and the other in localrun - Success
3. running both in clustermode  - the first started works fine, but the second 
fails.  Elastic fails with Unavailable I/O Exception , and Netty is running but 
fails to bind port 10999
The broker, bookie &amp; function logs are clean....
----
2020-04-16 06:37:02 UTC - Adelina Brask: Another issue I see is when I query a 
table in presto: `"Query 20200416_063428_00001_7znzp failed: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss".`
I am not sure if the issues are related, but I can't seem to find the reason in 
the logs. In broker logs i find:

`06:28:29.052 [function-timer-thread-58-1] ERROR 
org.apache.pulsar.functions.runtime.process.ProcessRuntime - Extracted Process 
death exception`
`java.lang.RuntimeException:` 
        `at 
org.apache.pulsar.functions.runtime.process.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:383)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.5.0.jar:2.5.0]`
        `at 
org.apache.pulsar.functions.runtime.process.ProcessRuntime.isAlive(ProcessRuntime.java:370)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.5.0.jar:2.5.0]`
        `at 
org.apache.pulsar.functions.runtime.RuntimeSpawner.lambda$start$0(RuntimeSpawner.java:88)
 ~[org.apache.pulsar-pulsar-functions-runtime-2.5.0.jar:2.5.0]`
        `at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_242]`
        `at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_242]`
        `at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_242]`
        `at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_242]`
        `at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_242]`
        `at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_242]`
        `at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]`
        `at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]`
`06:28:29.053 [function-timer-thread-58-1] ERROR 
org.apache.pulsar.functions.runtime.RuntimeSpawner - 
public/default/elastic-java.lang.RuntimeException:  Function Container is dead 
with exception.. restarting`
----
2020-04-16 08:52:59 UTC - Sijie Guo: If you are using presto sql, you need to 
make sure the presto worker is able to connect to zookeeper and bookkeeper. 
“ConnectionLossException” usually means that the worker is not able to connect 
to zookeeper.

---

How did you deploy the cluster?
----
2020-04-16 08:54:43 UTC - Sijie Guo: Is the output topic of the source 
connector the input topic for the sink connector?
----
2020-04-16 08:55:21 UTC - Sijie Guo: Unavailable I/O exception usually 
indicates that connector failed to start.
----
2020-04-16 08:55:46 UTC - Sijie Guo: Did you check the log file under 
`logs/&lt;tenant&gt;/&lt;namespace&gt;/&lt;function-name&gt;.log`?
----
2020-04-16 08:56:43 UTC - Adelina Brask: Hi Sijieg. Yes it its. Both start well 
in local mode (localrun), but one of them fail in cluster mode (which ever I 
start second). Yes I checked the logs, but the file is empty as the connector 
fails before running (and before making a log file)
----
2020-04-16 08:56:43 UTC - Sijie Guo: If you can provide the broker log and 
functions logs, that would be super helpful,
----
2020-04-16 08:57:59 UTC - Adelina Brask: That makes so much sense...I am 
working now to start Presto in cluster mode and connect to zookeeper. 
:slightly_smiling_face: Thanks for the tip.
ok_hand : Sijie Guo
----
2020-04-16 08:58:50 UTC - Sijie Guo: Do you happen to have the commands that 
you used for running those connectors in the 3 steps you described?
----
2020-04-16 08:59:06 UTC - Sijie Guo: I am guessing it is a problem related 
mismatched schema.
----

Slack digest for #general - 2020-04-16

Reply via email to