2020-04-15 09:29:16 UTC - Frederic Tausch: @Frederic Tausch has joined the channel ---- 2020-04-15 09:32:49 UTC - Maxi: @Maxi has joined the channel ---- 2020-04-15 10:29:12 UTC - Hiroyuki Yamada: @Sijie Guo I checked the ICDE paper breifly and got the 2nd question. I’m still not sure about the 1st one. (sorry for pushing) • Any benefit for having E > Qw ? • E=2, Qw=2, Qa=2 is a good configuration for production ? • (related to the above) The prod-ready helm chart is using E=3, Qw=3, Qa=2 with 3 brokers, so it can’t tolerate even 1 node failure but is is OK ? ---- 2020-04-15 10:56:16 UTC - Tom Weinberg: @Tom Weinberg has joined the channel ---- 2020-04-15 12:13:38 UTC - Ebere Abanonu: @Sijie Guo I have a Schema with Array field but looking at presto's stats all other fields are listed except the array field. Is array supported in pulsar connector? Additionally, the topic is created but using admin-pulsar to list topics it does not get listed ---- 2020-04-15 12:50:56 UTC - Penghui Li: Array field and map field are not support now. If you have not create any subscription on the topic, the topic will be automatically deleted if no active producers. You can set retention policy for the namespace or disable the inactive topic auto deletion. ---- 2020-04-15 13:35:15 UTC - Aravindhan: <#C5Z4T36F7|general> I am using Kubernetes deployment and Running source connector in the broker. In order to run the source connector, I am copying the required files into the container every time and running the source connector create command. Is this the recommended approach or the pulsar image can be rebuilt to have the required files? ---- 2020-04-15 15:33:27 UTC - Rahul: Does pulsar support refresh of auth token without impacting producing/consuming? As per my understanding, pulsar only validats the token during lookup. Once the required broker is found after that it doesn't do anymore validation. Please correct me if I'm wrong ---- 2020-04-15 15:45:22 UTC - Matteo Merli: Yes, it's added in master (for 2.6) ---- 2020-04-15 15:45:57 UTC - Matteo Merli: If the tokens have a TTL set, the broker will force the client to re-validate the connection once the TTL expires ---- 2020-04-15 15:47:08 UTC - Raman Gupta: That's -^ my use case as well, and I suspect the use case of most people looking for this capability. +1 : Xavier Levaux ---- 2020-04-15 15:59:41 UTC - Rahul: @Matteo Merli Is the TTL method available in 2.5.0? ---- 2020-04-15 16:00:36 UTC - Matteo Merli: No, in 2.5.0 is only validated at the beginning of the session. When the client is connected, then it will stay connected ---- 2020-04-15 16:01:03 UTC - Rahul: @Matteo Merli Thanks for the info ---- 2020-04-15 16:57:50 UTC - JG: Is it possible to make reactive programming with Puslar ? With frameworks such as Vertx and RxJava ??? ---- 2020-04-15 17:22:06 UTC - Evan Furman: Just checking back in here. Let us know when we’re ready to test the new PR. Thanks! ---- 2020-04-15 18:16:03 UTC - David Kjerrumgaard: Pulsar Functions process events immediately upon receiving them from a Pulsar topic. Would you classify that as reactive ? ---- 2020-04-15 20:51:27 UTC - rwaweber: Hey all! Deployment-centric question, is there any reason there is mention of _small_ disks here wrt bookkeeper? Are larger disks discouraged?
<https://pulsar.apache.org/docs/en/deploy-bare-metal/#bookies-and-brokers> ---- 2020-04-15 20:51:53 UTC - rwaweber: Also, in that same document, it appears that the bookies are running on the same host as the pulsar broker. Is that a common pattern? I feel like over time, it would becomes difficult to scale out a cluster configured like this, right? ---- 2020-04-15 20:59:04 UTC - David Kjerrumgaard: @rwaweber Brokers and bookies were designed to be decoupled and should run on different machines if you plan on scaling out the cluster. For smaller "dev" type clusters they are typically co-located primarily because it is hard to requisition 6+ machines for a small POC type project. HTH. +1 : rwaweber thanks : rwaweber ---- 2020-04-15 21:01:24 UTC - rwaweber: Thanks David! That certainly helped! ---- 2020-04-15 21:02:17 UTC - David Kjerrumgaard: • @rwaweber I believe you are referring to the phrase "Small and fast <https://en.wikipedia.org/wiki/Solid-state_drive|solid-state drives> (SSDs) or <https://en.wikipedia.org/wiki/Hard_disk_drive|hard disk drives> (HDDs) with a <https://en.wikipedia.org/wiki/RAID|RAID> controller and a battery-backed write cache (for BookKeeper bookies)" I believe that this statement merely acknowledges the fact that SSDs are inherently smaller than traditional HDDs. The larger point being that the disks you choose should be resilient to failures at the hardware level. +1 : rwaweber ---- 2020-04-15 21:03:44 UTC - David Kjerrumgaard: That being said, to answer your inital question. Larger disks are not discouraged, but slow ones are. cool : rwaweber ---- 2020-04-15 21:05:21 UTC - rwaweber: Ahhhh gotcha, so the distinction isn’t made between SSDs or running spinning drives with RAID. Both ought to be resilient to failures at the hardware level. E.g. SSDs _with_ RAID or spinning drives _with_ RAID Makes sense! +1 : David Kjerrumgaard ---- 2020-04-15 22:54:25 UTC - JG: Hey guys, anyone got an idea how to do event streaming/sourcing with Pulsar ? What is the topic strategy and how to replay events ? ---- 2020-04-15 23:01:36 UTC - David Kjerrumgaard: @JG Could you elaborate a bit on the use case or what problem you are trying to solve with this approach? ---- 2020-04-16 00:15:14 UTC - JG: Before I had a Postgresql as event storage and MongoDB as query database ( CQRS system ), I could replay all events to "rebuild" the query databases ---- 2020-04-16 00:15:58 UTC - JG: I would like to remove my postgresql event storage and use Pulsar book keeper as event storage ? ---- 2020-04-16 00:23:47 UTC - JG: I suppose I should use Pulsar I/O MongoDB connector for sink ? ---- 2020-04-16 00:39:56 UTC - Sijie Guo: Are you going to do point lookup in the event storage? ---- 2020-04-16 01:30:59 UTC - Tolulope Awode: Hi @Sijie Guo ---- 2020-04-16 01:31:08 UTC - Tolulope Awode: Thanks for the other time ---- 2020-04-16 01:44:46 UTC - Penghui Li: <https://github.com/apache/pulsar/pull/6647> is merged and 2.5.1 includes this PR. If you want to test it earlier, you can build the master branch or you can wait for 2.5.1 release(also 2.5.1 RC is out, you can download the RC version to verify). Note: If test with non-batch message, it's better to increase the `dispatcherMaxReadBatchSize` in the broker.conf +1 : Hiroyuki Yamada ---- 2020-04-16 01:53:03 UTC - Jared Mackey: Using a non-persistent topic, how do I ack messages? The message IDs are the default values of 0. ---- 2020-04-16 01:54:50 UTC - Jared Mackey: For context: I have been writing my own client library and referencing the go client lib. I don’t see them handling non persistent messages differently. ---- 2020-04-16 02:01:46 UTC - Jared Mackey: Since I’m not able to ack, after I have hit my flow # I stop receiving new messages. ---- 2020-04-16 02:53:50 UTC - Jared Mackey: Ok, so it looks like the Go client is dynamically calling flow based on some work queue depth. Our client has only been calling it at startup and was expecting pulsar to provide the back pressure when our workers get tied up. Looks like this style isn’t compatible with non persistent topics since the acks do not work. ---- 2020-04-16 04:06:42 UTC - Jared Mackey: I see the err of my ways here, looks like I need to continously flow in either case. ---- 2020-04-16 05:49:16 UTC - Adelina Brask: @Adelina Brask has joined the channel ---- 2020-04-16 06:21:29 UTC - Adelina Brask: Hi guys, I have an issue I can't seem to fix, and I need all the help possible. I can't seem to deploy both a source (netty or generator) and a sink (elastic) at the same time in cluster mode. I have 3 brokers + bookie and a 3 nodes cluster of zookeeper. What I have tried is: 1. running both Netty/Generator and Elastic in localrun - Success 2. running only sink/source in clustermode and the other in localrun - Success 3. running both in clustermode - the first started works fine, but the second fails. Elastic fails with Unavailable I/O Exception , and Netty is running but fails to bind port 10999 The broker, bookie & function logs are clean.... ---- 2020-04-16 06:37:02 UTC - Adelina Brask: Another issue I see is when I query a table in presto: `"Query 20200416_063428_00001_7znzp failed: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss".` I am not sure if the issues are related, but I can't seem to find the reason in the logs. In broker logs i find: `06:28:29.052 [function-timer-thread-58-1] ERROR org.apache.pulsar.functions.runtime.process.ProcessRuntime - Extracted Process death exception` `java.lang.RuntimeException:` `at org.apache.pulsar.functions.runtime.process.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:383) ~[org.apache.pulsar-pulsar-functions-runtime-2.5.0.jar:2.5.0]` `at org.apache.pulsar.functions.runtime.process.ProcessRuntime.isAlive(ProcessRuntime.java:370) ~[org.apache.pulsar-pulsar-functions-runtime-2.5.0.jar:2.5.0]` `at org.apache.pulsar.functions.runtime.RuntimeSpawner.lambda$start$0(RuntimeSpawner.java:88) ~[org.apache.pulsar-pulsar-functions-runtime-2.5.0.jar:2.5.0]` `at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]` `at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_242]` `at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_242]` `at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_242]` `at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]` `at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]` `at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]` `at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]` `06:28:29.053 [function-timer-thread-58-1] ERROR org.apache.pulsar.functions.runtime.RuntimeSpawner - public/default/elastic-java.lang.RuntimeException: Function Container is dead with exception.. restarting` ---- 2020-04-16 08:52:59 UTC - Sijie Guo: If you are using presto sql, you need to make sure the presto worker is able to connect to zookeeper and bookkeeper. “ConnectionLossException” usually means that the worker is not able to connect to zookeeper. --- How did you deploy the cluster? ---- 2020-04-16 08:54:43 UTC - Sijie Guo: Is the output topic of the source connector the input topic for the sink connector? ---- 2020-04-16 08:55:21 UTC - Sijie Guo: Unavailable I/O exception usually indicates that connector failed to start. ---- 2020-04-16 08:55:46 UTC - Sijie Guo: Did you check the log file under `logs/<tenant>/<namespace>/<function-name>.log`? ---- 2020-04-16 08:56:43 UTC - Adelina Brask: Hi Sijieg. Yes it its. Both start well in local mode (localrun), but one of them fail in cluster mode (which ever I start second). Yes I checked the logs, but the file is empty as the connector fails before running (and before making a log file) ---- 2020-04-16 08:56:43 UTC - Sijie Guo: If you can provide the broker log and functions logs, that would be super helpful, ---- 2020-04-16 08:57:59 UTC - Adelina Brask: That makes so much sense...I am working now to start Presto in cluster mode and connect to zookeeper. :slightly_smiling_face: Thanks for the tip. ok_hand : Sijie Guo ---- 2020-04-16 08:58:50 UTC - Sijie Guo: Do you happen to have the commands that you used for running those connectors in the 3 steps you described? ---- 2020-04-16 08:59:06 UTC - Sijie Guo: I am guessing it is a problem related mismatched schema. ----
