2020-01-27 10:55:16 UTC - Rohit Sharma: @Rohit Sharma has joined the channel ---- 2020-01-27 11:44:38 UTC - Kevin Huber: Good afternoon, is there any tutorial to setup a pulsar cluster with docker? The only pulsar docker image I found is a standalone version. ---- 2020-01-27 11:51:13 UTC - Eugen: @Kevin Huber Maybe one of the options on the "Deploying Pulsar on Kubernetes" page is for you: <https://pulsar.apache.org/docs/en/deploy-kubernetes/> ---- 2020-01-27 11:59:18 UTC - Eugen: I'd like to add a feature matrix to the pulsar docs, because for newbies it can be hard to remember which features work together and which do not - at least for me it is. Examples: Readers cannot be used with partitioned topics, and cumulative acknowledgement does not work for key_shared subscriptions. A first draft, with only some cells filled in: ```━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ pers topic non-pers topic part topic reader dedupe regex sub cumulat ack excl sub failover sub shared sub key_shared sub ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── pers topic ↘ n/a ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ non-pers topic ↘ ✓ part topic ↘ ✗ reader ↘ dedupe ↘ regex sub ↘ cumulat ack ↘ ✓ ✓ ✓ ✗ excl sub ↘ failover sub ↘ shared sub ↘ key_shared sub ↘ ``` In above we already see a problem - we will need to split this feature matrix into multiple, smaller ones, as this one is already too wide, even though it does not contain all Pulsar features. Unless I hear objections, I'll create a PR to the docs for this in the next couple of days. ---- 2020-01-27 13:26:51 UTC - Yoav Cohen: @Yoav Cohen has joined the channel ---- 2020-01-27 14:46:13 UTC - Matt Mitchell: Hi. I have an application that behaves much like a “web crawler”, where clients produce URL messages, and servers consume and then validate. If valid, the servers pushes to a separate “fetch” topic, in which clients consume from, download the URL and send back content to the servers. Once finished, I’d like all messages to be persisted, so that when a “job” starts up again, the previously fetched pages can be re-published to the topic that clients consume from, and the process repeats. My questions are: 1. Assuming that long term storage would need to be enabled, how can messages from storage be re-published on subsequent job runs? 2. Is it possible to use Pulsar’s SQL feature to select certain messages to be re-published for subsequent job runs? 3. To prevent fetching of the same URLs more than once, Pulsar’s de-duplication can help with avoiding re-persisting dupe messages, but does that also mean consumers will not even see duplicates? 4. Is it possible to update a message’s value without re-publishing to a topic? For example, once a URL is fetched and a document is received, this message should be finalized with a few other fields/values, then saved. Is this a matter of directly talking to Bookkeeper? ---- 2020-01-27 15:59:42 UTC - Nouvelle: The command `bin/pulsar-admin brokers get-runtime-config` generates an error: ```Expected a command, got get-runtime-config
Exception in thread "main" com.beust.jcommander.ParameterException: Asking description for unknown command: null . . .``` ...and the usage doc returned from `bin/pulsar-admin brokers` does not list `get-runtime-config` as an option. ---- 2020-01-27 16:02:32 UTC - Nouvelle: This installation was deployed to k8s via the helm chart. ---- 2020-01-27 16:51:07 UTC - John Pradeep: @John Pradeep has joined the channel ---- 2020-01-27 17:00:02 UTC - Addison Higham: @Eugen readers *can* be used on partitioned topics, you just have to manually create a reader for each topic on the underlying topic partition, so instead of doing `my-partitioned-topic`, you create and manager a reader for each `my-partitioned-topics-partition-<n>` ---- 2020-01-27 17:32:00 UTC - Guilherme Perinazzo: I have a use-case where I need to distribute a notification to multiple topics, but I don't know before hand the topic. Right now i'm creating multiple producers, submitting the one message, and closing them. This feels wasteful, is there a way to create a producer that can produce to multiple topics? ---- 2020-01-27 17:55:14 UTC - Sijie Guo: @Guilherme Perinazzo Creating a producer should be relatively cheap since the channels are multiplexed and maintained at client level. You can use a Guava cache (or other caching library) to cache the producers for topics. ---- 2020-01-27 17:56:41 UTC - Sijie Guo: A feature matrix like this is great ---- 2020-01-27 18:02:11 UTC - Addison Higham: we have a use case like this (not yet implemented, but we have designed around) and that is mostly what we are thinking, with perhaps the inclusion of a TTL so if a producer isn't used in x period of time it's gets closed +1 : Sijie Guo ---- 2020-01-27 18:22:39 UTC - Jerry Peng: Pulsar just needs 2 more stars to have 5K stars on Github: <https://github.com/apache/pulsar/stargazers> Please star the project if you haven't already!!! star2 : Roman Popenov, Ali Ahmed sports_medal : Roman Popenov, Ali Ahmed ---- 2020-01-27 18:23:48 UTC - ravi satya durga prasad Yenugula: Done ---- 2020-01-27 18:23:55 UTC - ravi satya durga prasad Yenugula: 5K Start ---- 2020-01-27 18:24:18 UTC - ravi satya durga prasad Yenugula: Please provide a proper docker setup document heart : Pedro Cardoso ---- 2020-01-27 18:36:47 UTC - Jerry Peng: :tada::tada::tada: ---- 2020-01-27 18:48:30 UTC - Rob Long: @Rob Long has joined the channel ---- 2020-01-27 19:19:44 UTC - Matt Mitchell: Playing around with Pulsar SQL / Presto and seeing this error when starting from within the official Docker container: ```root@pulsar:/pulsar# ./bin/pulsar sql-worker run /pulsar/conf/pulsar_env.sh: line 45: -Xms512m: command not found ERROR: ld.so: object '/pulsar/lib/presto/bin/procname/Linux-x86_64/libprocname.so' from LD_PRELOAD cannot be preloaded (ELF file's phentsize not the expected size): ignored.``` The process then continues and seems to start SQL. From another terminal, I run this (from the “getting started” article) and see another error: ```presto> show schemas in pulsar; Query 20200127_191045_00002_a3t5e failed: Failed to get schemas from pulsar: Cannot cast org.glassfish.jersey.inject.hk2.Hk2InjectionManagerFactory to org.glassfish.jersey.internal.inject.InjectionManagerFactory``` which then kills the SQL process. I’m guessing this is all related to not setting up the container correctly? Anyone know if that’s the case, and if so, what I should do to get it running via Docker? ---- 2020-01-27 19:36:04 UTC - David Kjerrumgaard: @ravi satya durga prasad Yenugula Can you provide a little more detail on your request? ---- 2020-01-27 19:36:24 UTC - David Kjerrumgaard: @Matt Mitchell Which docker image and tag are you using? ---- 2020-01-27 19:40:16 UTC - ravi satya durga prasad Yenugula: @David Kjerrumgaard, In the Docker repo the document place is blank ---- 2020-01-27 19:40:44 UTC - ravi satya durga prasad Yenugula: ---- 2020-01-27 19:41:36 UTC - ravi satya durga prasad Yenugula: and even other repos also ---- 2020-01-27 19:41:40 UTC - ravi satya durga prasad Yenugula: is same ---- 2020-01-27 19:42:10 UTC - ravi satya durga prasad Yenugula: except the main one ---- 2020-01-27 19:43:17 UTC - Matt Mitchell: @David Kjerrumgaard it’s `apachepulsar/pulsar:latest` ---- 2020-01-27 19:43:45 UTC - ravi satya durga prasad Yenugula: Even in the main image please add steps to pull th image ---- 2020-01-27 19:43:51 UTC - ravi satya durga prasad Yenugula: Thanks ---- 2020-01-27 19:44:50 UTC - Matt Mitchell: 895e190fb267 ---- 2020-01-27 19:46:38 UTC - David Kjerrumgaard: Thanks +1 : Matt Mitchell ---- 2020-01-27 20:18:47 UTC - Eugen: Thanks for the feedback, I will prepare a PR for this. ---- 2020-01-27 20:20:01 UTC - Eugen: @Addison Higham Again, good to know. I will add a footnote to the cell in question ---- 2020-01-27 20:53:24 UTC - Guilherme Perinazzo: does the heartbeat config for the debezium postgresql source work? ---- 2020-01-27 21:14:35 UTC - Roman Popenov: Is there a command to check how many broker pods are up? ---- 2020-01-27 21:44:07 UTC - Greg Methvin: fwiw, we are using the current implementation with several million scheduled messages at a time, though the delays are usually less than a day. It seems to be working fine for us now. 100 : Sijie Guo ---- 2020-01-27 22:06:25 UTC - juraj: where could one find the changelog for 2.5.0 ? ---- 2020-01-27 22:06:47 UTC - Roman Popenov: <http://pulsar.apache.org/release-notes/#250-mdash-2019-12-06-a-id250a> ---- 2020-01-27 22:08:41 UTC - juraj: thanks! couldn't find it on <https://github.com/apache/pulsar/releases> +1 : Roman Popenov ---- 2020-01-27 22:09:17 UTC - Roman Popenov: The link is also in the channel topic ---- 2020-01-27 22:39:06 UTC - juraj: that's kinda easy to miss ---- 2020-01-27 23:05:01 UTC - Roman Popenov: ```root@pulsar-mini-roman-proxy-c54969665-qvpgn:/pulsar# bin/pulsar-admin broker-stats monitoring-metrics HTTP 403 Forbidden Reason: HTTP 403 Forbidden``` I cannot see the metrics from a proxy? ---- 2020-01-27 23:05:16 UTC - Roman Popenov: Is it possible for proxy to only start up when the brokers came up? ---- 2020-01-27 23:07:07 UTC - Matt Mitchell: I _think_ I now understand that Pulsar’s long term storage is not the right fit, because it’s basically segmented data streams. What’s really needed is a database, maybe even backed by a Pulsar Connector, which reads from a database and streams into Pulsar at job creation time. Would be great to know if there are integrations such as this out there in th wild. ---- 2020-01-27 23:13:21 UTC - juraj: try to kill the proxy process in the pod to force it to restart ---- 2020-01-27 23:14:21 UTC - Roman Popenov: No, I am trying to run the proxy when all the brokers have started ---- 2020-01-27 23:14:59 UTC - Roman Popenov: naively I thought that something like this could run: ```bin/pulsar-admin broker-stats monitoring-metrics -i | grep "brk_ml_count" | grep -o -P '.{1}$'``` ---- 2020-01-27 23:15:16 UTC - Roman Popenov: but does not have permissions to connect ---- 2020-01-27 23:15:23 UTC - Roman Popenov: broker pods and bastion pods can ---- 2020-01-27 23:15:32 UTC - Roman Popenov: bookie and proxy cannot ---- 2020-01-27 23:17:16 UTC - juraj: there is an issue with proxy init, the restart helps ---- 2020-01-27 23:17:22 UTC - juraj: are cumulative acknowledgements supposed to work with a multi-topic consumer ? ---- 2020-01-27 23:17:46 UTC - Roman Popenov: I know, I am trying to start the proxy when all the broker are already running ---- 2020-01-27 23:18:01 UTC - juraj: i'm not sure if that fixes the issue, idk ---- 2020-01-27 23:20:47 UTC - Roman Popenov: But I would need to connect using the pulsar-admin to check how many broker pods are running, it will get me 403 ---- 2020-01-27 23:22:40 UTC - Addison Higham: @Roman Popenov this should be fixed in 2.5.0 and perhaps 2.4.2? <https://github.com/apache/pulsar/pull/4921> ---- 2020-01-27 23:22:44 UTC - Addison Higham: what version are you on? ---- 2020-01-27 23:23:26 UTC - Addison Higham: oh perhaps I am not understanding properly, thought you were hitting the `/metrics` endpoint for a proxy ---- 2020-01-27 23:23:29 UTC - Addison Higham: not sure on that one ---- 2020-01-27 23:23:59 UTC - Roman Popenov: I think I am hitting: <https://github.com/apache/pulsar/issues/5994> ---- 2020-01-27 23:25:24 UTC - juraj: ^i reported that ---- 2020-01-27 23:27:50 UTC - Ali Hamidi: @Ali Hamidi has joined the channel ---- 2020-01-28 01:26:55 UTC - Roman Popenov: Yes, I can confirm that if proxy comes up after the brokers appear in the nslookup, the proxy doesn’t have this issue ---- 2020-01-28 01:33:33 UTC - Roman Popenov: I have posted a temporary work-around in the issue ---- 2020-01-28 01:55:04 UTC - Sijie Guo: yes. you can add an init container to do broker health check. ---- 2020-01-28 01:55:26 UTC - Sijie Guo: you need to ack for individual partitions. ---- 2020-01-28 01:56:12 UTC - Roman Popenov: I’ve done a work around in <https://github.com/apache/pulsar/issues/5994>. Not exactly if that is the best approach, but it does seem to solve the issue of having to restart the proxies manually ---- 2020-01-28 01:57:35 UTC - Roman Popenov: Is there a better way to do the broker health check? I noticed that the proxy needs the broker pods running before it can start. If it starts before the brokers, it cannot run the pulsar-admin commands and the pulsar-admin returns 403 ---- 2020-01-28 01:57:44 UTC - Roman Popenov: So it’s a bit of a chicken and an egg… ---- 2020-01-28 01:58:40 UTC - Sijie Guo: this approach should be the right approach. although you can change ` ```-ge {{ .Values.broker.replicaCount }}``` to `-ge 1` ---- 2020-01-28 01:58:49 UTC - Sijie Guo: you just need one broker to be available +1 : Roman Popenov ---- 2020-01-28 01:59:12 UTC - Roman Popenov: Will do, thanks! ---- 2020-01-28 01:59:43 UTC - Sijie Guo: thanks ---- 2020-01-28 02:05:35 UTC - Sijie Guo: @ravi satya durga prasad Yenugula the documentation about Pulsar docker image is all in pulsar website. we don’t spread out the documentation across different places as it is really hard for us to maintain and keep them consistent. If you want to run pulsar in docker as standalone, you can check out the documentation at : <http://pulsar.apache.org/docs/en/standalone-docker/> If you want to run pulsar in docker in a distributed mode, k8s is recommended. You can checkout the documentation: <http://pulsar.apache.org/docs/en/deploy-kubernetes/> ---- 2020-01-28 02:30:13 UTC - Sijie Guo: @David Kjerrumgaard @Matt Mitchell currently the current version of presto that pulsar is using doesn’t run well on OpenJDK. there is already an issue filed for that: <https://github.com/apache/pulsar/issues/5370> We just need to upgrade presto version. But that requires some code changes since there some changes in presto we need to adjust. If anyone is interested in helping with that issue, contributions are welcome. ---- 2020-01-28 02:41:27 UTC - Eugen: Does message deduplication work for partitioned topics? If so, as deduplication is handled in brokers, which are only aware of the sequence ids they have seen, the producer would have to maintain and send sequence ids per partition as well. I can't find any documentation on this. ---- 2020-01-28 02:59:06 UTC - Sijie Guo: it is only maintained per partition. ---- 2020-01-28 03:09:21 UTC - Eugen: I see. Actually, thinking about it again, the producers do *not* need to maintain and use sequence ids per partition for this to work - because gaps are ok! (This is different from the "redundant producer" use case I'm discussing over in the dev channel, where the producer would have to be aware of partitions) ---- 2020-01-28 03:10:54 UTC - Eugen: But that assumes that the partitioner is deterministic - does this hold for RoundRobin? Will the same sequence id get sent to the same partition? ---- 2020-01-28 03:27:34 UTC - ravi satya durga prasad Yenugula: ---- 2020-01-28 03:30:51 UTC - ravi satya durga prasad Yenugula: removed `$` ---- 2020-01-28 04:49:29 UTC - Pradeesh: @Pradeesh has joined the channel ---- 2020-01-28 05:03:10 UTC - Pradeesh: @Sijie Guo running into an error getting bookie running ```04:55:13.795 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 3.951: Total time for which application threads were stopped: 0.0000476 seconds, Stopping threads took: 0.0000265 seconds 3.952: Total time for which application threads were stopped: 0.0000762 seconds, Stopping threads took: 0.0000143 seconds 3.952: Total time for which application threads were stopped: 0.0000559 seconds, Stopping threads took: 0.0000227 seconds 04:55:13.799 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket error occurred: localhost/127.0.0.1:2181: Connection refused 4.953: Total time for which application threads were stopped: 0.0001144 seconds, Stopping threads took: 0.0000211 seconds 04:55:14.902 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 04:55:14.903 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket error occurred: localhost/127.0.0.1:2181: Connection refused 5.953: Total time for which application threads were stopped: 0.0000819 seconds, Stopping threads took: 0.0000233 seconds``` ---- 2020-01-28 05:03:46 UTC - Pradeesh: for some reason the bookie tries to connect at localhost:2181 ---- 2020-01-28 05:04:17 UTC - Pradeesh: have this in my configmap ``` PULSAR_PREFIX_zkServers: zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper,zk-3.zookeeper,zk-4.zookeeper``` ---- 2020-01-28 05:05:42 UTC - sindhushree: @sindhushree has joined the channel ---- 2020-01-28 05:19:30 UTC - Pradeesh: looks like the file generated inside the container `/pulsar/conf/bookkeeper.conf` uses all defaults ---- 2020-01-28 05:20:03 UTC - Pradeesh: even though this looks good ---- 2020-01-28 05:20:09 UTC - Pradeesh: ```[conf/pulsar_env.sh] Applying config PULSAR_GC = " -XX:+UseG1GC " [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest -Xms28g -Xmx28g -XX:MaxDirectMemorySize=28g" [conf/pulsar_env.sh] Adding config dbStorage_readAheadCacheMaxSizeMb = 1024 [conf/pulsar_env.sh] Adding config dbStorage_rocksDB_blockCacheSize = 4294967296 [conf/pulsar_env.sh] Adding config dbStorage_writeCacheMaxSizeMb = 1024 [conf/pulsar_env.sh] Adding config journalMaxSizeMB = 2048 [conf/pulsar_env.sh] Adding config statsProviderClass = org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider [conf/pulsar_env.sh] Adding config useHostNameAsBookieID = true [conf/pulsar_env.sh] Adding config zkServers = zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper,zk-3.zookeeper,zk-4.zookeeper [conf/bkenv.sh] Adding config dbStorage_readAheadCacheMaxSizeMb = 1024 [conf/bkenv.sh] Adding config dbStorage_rocksDB_blockCacheSize = 4294967296 [conf/bkenv.sh] Adding config dbStorage_writeCacheMaxSizeMb = 1024 [conf/bkenv.sh] Adding config journalMaxSizeMB = 2048 [conf/bkenv.sh] Adding config statsProviderClass = org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider [conf/bkenv.sh] Adding config useHostNameAsBookieID = true [conf/bkenv.sh] Adding config zkServers = zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper,zk-3.zookeeper,zk-4.zookeeper [conf/pulsar_env.sh] Applying config PULSAR_GC = " -XX:+UseG1GC " [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest -Xms28g -Xmx28g -XX:MaxDirectMemorySize=28g" [conf/pulsar_env.sh] Applying config PULSAR_GC = " -XX:+UseG1GC " [conf/pulsar_env.sh] Applying config PULSAR_MEM = "-Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest -Xms28g -Xmx28g -XX:MaxDirectMemorySize=28g"``` ----
