Slack digest for #general - 2020-01-28

Apache Pulsar Slack Tue, 28 Jan 2020 01:11:34 -0800

2020-01-27 10:55:16 UTC - Rohit Sharma: @Rohit Sharma has joined the channel
----
2020-01-27 11:44:38 UTC - Kevin Huber: Good afternoon, is there any tutorial to 
setup a pulsar cluster with docker? The only pulsar docker image I found is a 
standalone version.
----
2020-01-27 11:51:13 UTC - Eugen: @Kevin Huber Maybe one of the options on the 
"Deploying Pulsar on Kubernetes" page is for you: 
<https://pulsar.apache.org/docs/en/deploy-kubernetes/>
----
2020-01-27 11:59:18 UTC - Eugen: I'd like to add a feature matrix to the pulsar 
docs, because for newbies it can be hard to remember which features work 
together and which do not - at least for me it is. Examples: Readers cannot be 
used with partitioned topics, and cumulative acknowledgement does not work for 
key_shared subscriptions. A first draft, with only some cells filled in:
```━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                 pers topic  non-pers topic  part topic  reader  dedupe  regex 
sub  cumulat ack  excl sub  failover sub  shared sub  key_shared sub 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 pers topic      ↘           n/a             ✓           ✓       ✓       ✓      
    ✓            ✓         ✓             ✓           ✓              
 non-pers topic              ↘               ✓                                  
                                                                    
 part topic                                  ↘           ✗                      
                                                                    
 reader                                                  ↘                      
                                                                    
 dedupe                                                          ↘              
                                                                    
 regex sub                                                               ↘      
                                                                    
 cumulat ack                                                                    
    ↘            ✓         ✓             ✓           ✗              
 excl sub                                                                       
                 ↘                                                  
 failover sub                                                                   
                           ↘                                        
 shared sub                                                                     
                                         ↘                          
 key_shared sub                                                                 
                                                     ↘              ```
In above we already see a problem - we will need to split this feature matrix 
into multiple, smaller ones, as this one is already too wide, even though it 
does not contain all Pulsar features. Unless I hear objections, I'll create a 
PR to the docs for this in the next couple of days.
----
2020-01-27 13:26:51 UTC - Yoav Cohen: @Yoav Cohen has joined the channel
----
2020-01-27 14:46:13 UTC - Matt Mitchell: Hi. I have an application that behaves 
much like a “web crawler”, where clients produce URL messages, and servers 
consume and then validate. If valid, the servers pushes to a separate “fetch” 
topic, in which clients consume from, download the URL and send back content to 
the servers. Once finished, I’d like all messages to be persisted, so that when 
a “job” starts up again, the previously fetched pages can be re-published to 
the topic that clients consume from, and the process repeats. My questions are:
1. Assuming that long term storage would need to be enabled, how can messages 
from storage be re-published on subsequent job runs?
2. Is it possible to use Pulsar’s SQL feature to select certain messages to be 
re-published for subsequent job runs?
3. To prevent fetching of the same URLs more than once, Pulsar’s de-duplication 
can help with avoiding re-persisting dupe messages, but does that also mean 
consumers will not even see duplicates?
4. Is it possible to update a message’s value without re-publishing to a topic? 
For example, once a URL is fetched and a document is received, this message 
should be finalized with a few other fields/values, then saved. Is this a 
matter of directly talking to Bookkeeper?
----
2020-01-27 15:59:42 UTC - Nouvelle: The command `bin/pulsar-admin brokers 
get-runtime-config` generates an error:
```Expected a command, got get-runtime-config


Exception in thread "main" com.beust.jcommander.ParameterException: Asking 
description for unknown command: null
  .   .  .```
...and the usage doc returned from `bin/pulsar-admin brokers` does not list 
`get-runtime-config` as an option.
----
2020-01-27 16:02:32 UTC - Nouvelle: This installation was deployed to k8s via 
the helm chart.
----
2020-01-27 16:51:07 UTC - John Pradeep: @John Pradeep has joined the channel
----
2020-01-27 17:00:02 UTC - Addison Higham: @Eugen readers *can* be used on 
partitioned topics, you just have to manually create a reader for each topic on 
the underlying topic partition, so instead of doing `my-partitioned-topic`, you 
create and manager a reader for each `my-partitioned-topics-partition-&lt;n&gt;`
----
2020-01-27 17:32:00 UTC - Guilherme Perinazzo: I have a use-case where I need 
to distribute a notification to multiple topics, but I don't know before hand 
the topic. Right now i'm creating multiple producers, submitting the one 
message, and closing them. This feels wasteful, is there a way to create a 
producer that can produce to multiple topics?
----
2020-01-27 17:55:14 UTC - Sijie Guo: @Guilherme Perinazzo

Creating a producer should be relatively cheap since the channels are 
multiplexed and maintained at client level.

You can use a Guava cache (or other caching library) to cache the producers for 
topics.
----
2020-01-27 17:56:41 UTC - Sijie Guo: A feature matrix like this is great
----
2020-01-27 18:02:11 UTC - Addison Higham: we have a use case like this (not yet 
implemented, but we have designed around) and that is mostly what we are 
thinking, with perhaps the inclusion of a TTL so if a producer isn't used in x 
period of time it's gets closed
+1 : Sijie Guo
----
2020-01-27 18:22:39 UTC - Jerry Peng: Pulsar just needs 2 more stars to have 5K 
stars on Github:
<https://github.com/apache/pulsar/stargazers>
Please star the project if you haven't already!!!
star2 : Roman Popenov, Ali Ahmed
sports_medal : Roman Popenov, Ali Ahmed
----
2020-01-27 18:23:48 UTC - ravi satya durga prasad Yenugula: Done
----
2020-01-27 18:23:55 UTC - ravi satya durga prasad Yenugula: 5K Start
----
2020-01-27 18:24:18 UTC - ravi satya durga prasad Yenugula: Please provide a 
proper docker setup document
heart : Pedro Cardoso
----
2020-01-27 18:36:47 UTC - Jerry Peng: :tada::tada::tada:
----
2020-01-27 18:48:30 UTC - Rob Long: @Rob Long has joined the channel
----
2020-01-27 19:19:44 UTC - Matt Mitchell: Playing around with Pulsar SQL / 
Presto and seeing this error when starting from within the official Docker 
container:
```root@pulsar:/pulsar# ./bin/pulsar sql-worker run
/pulsar/conf/pulsar_env.sh: line 45: -Xms512m: command not found
ERROR: ld.so: object 
'/pulsar/lib/presto/bin/procname/Linux-x86_64/libprocname.so' from LD_PRELOAD 
cannot be preloaded (ELF file's phentsize not the expected size): ignored.```
The process then continues and seems to start SQL. From another terminal, I run 
this (from the “getting started” article) and see another error:
```presto&gt; show schemas in pulsar;
Query 20200127_191045_00002_a3t5e failed: Failed to get schemas from pulsar: 
Cannot cast org.glassfish.jersey.inject.hk2.Hk2InjectionManagerFactory to 
org.glassfish.jersey.internal.inject.InjectionManagerFactory```
which then kills the SQL process.

I’m guessing this is all related to not setting up the container correctly? 
Anyone know if that’s the case, and if so, what I should do to get it running 
via Docker?
----
2020-01-27 19:36:04 UTC - David Kjerrumgaard: @ravi satya durga prasad Yenugula 
Can you provide a little more detail on your request?
----
2020-01-27 19:36:24 UTC - David Kjerrumgaard: @Matt Mitchell Which docker image 
and tag are you using?
----
2020-01-27 19:40:16 UTC - ravi satya durga prasad Yenugula: @David 
Kjerrumgaard, In the Docker repo the document place is blank
----
2020-01-27 19:40:44 UTC - ravi satya durga prasad Yenugula: 
----
2020-01-27 19:41:36 UTC - ravi satya durga prasad Yenugula: and even other 
repos also
----
2020-01-27 19:41:40 UTC - ravi satya durga prasad Yenugula: is same
----
2020-01-27 19:42:10 UTC - ravi satya durga prasad Yenugula: except the main one
----
2020-01-27 19:43:17 UTC - Matt Mitchell: @David Kjerrumgaard it’s 
`apachepulsar/pulsar:latest`
----
2020-01-27 19:43:45 UTC - ravi satya durga prasad Yenugula: Even in the main 
image please add steps to pull th image
----
2020-01-27 19:43:51 UTC - ravi satya durga prasad Yenugula: Thanks
----
2020-01-27 19:44:50 UTC - Matt Mitchell: 895e190fb267
----
2020-01-27 19:46:38 UTC - David Kjerrumgaard: Thanks
+1 : Matt Mitchell
----
2020-01-27 20:18:47 UTC - Eugen: Thanks for the feedback, I will prepare a PR 
for this.
----
2020-01-27 20:20:01 UTC - Eugen: @Addison Higham Again, good to know. I will 
add a footnote to the cell in question
----
2020-01-27 20:53:24 UTC - Guilherme Perinazzo: does the heartbeat config for 
the debezium  postgresql source work?
----
2020-01-27 21:14:35 UTC - Roman Popenov: Is there a command to check how many 
broker pods are up?
----
2020-01-27 21:44:07 UTC - Greg Methvin: fwiw, we are using the current 
implementation with several million scheduled messages at a time, though the 
delays are usually less than a day. It seems to be working fine for us now.
100 : Sijie Guo
----
2020-01-27 22:06:25 UTC - juraj: where could one find the changelog for 2.5.0 ?
----
2020-01-27 22:06:47 UTC - Roman Popenov: 
<http://pulsar.apache.org/release-notes/#250-mdash-2019-12-06-a-id250a>
----
2020-01-27 22:08:41 UTC - juraj: thanks!  couldn't find it on 
<https://github.com/apache/pulsar/releases>
+1 : Roman Popenov
----
2020-01-27 22:09:17 UTC - Roman Popenov: The link is also in the channel topic
----
2020-01-27 22:39:06 UTC - juraj: that's kinda easy to miss
----
2020-01-27 23:05:01 UTC - Roman Popenov: 
```root@pulsar-mini-roman-proxy-c54969665-qvpgn:/pulsar# bin/pulsar-admin 
broker-stats monitoring-metrics
HTTP 403 Forbidden

Reason: HTTP 403 Forbidden```
I cannot see the metrics from a proxy?
----
2020-01-27 23:05:16 UTC - Roman Popenov: Is it possible for proxy to only start 
up when the brokers came up?
----
2020-01-27 23:07:07 UTC - Matt Mitchell: I _think_ I now understand that 
Pulsar’s long term storage is not the right fit, because it’s basically 
segmented data streams. What’s really needed is a database, maybe even backed 
by a Pulsar Connector, which reads from a database and streams into Pulsar at 
job creation time. Would be great to know if there are integrations such as 
this out there in th wild.
----
2020-01-27 23:13:21 UTC - juraj: try to kill the proxy process in the pod to 
force it to restart
----
2020-01-27 23:14:21 UTC - Roman Popenov: No, I am trying to run the proxy when 
all the brokers have started
----
2020-01-27 23:14:59 UTC - Roman Popenov: naively I thought that something like 
this could run:
```bin/pulsar-admin broker-stats monitoring-metrics -i | grep "brk_ml_count" | 
grep -o -P '.{1}$'```

----
2020-01-27 23:15:16 UTC - Roman Popenov: but does not have permissions to 
connect
----
2020-01-27 23:15:23 UTC - Roman Popenov: broker pods and bastion pods can
----
2020-01-27 23:15:32 UTC - Roman Popenov: bookie and proxy cannot
----
2020-01-27 23:17:16 UTC - juraj: there is an issue with proxy init, the restart 
helps
----
2020-01-27 23:17:22 UTC - juraj: are cumulative acknowledgements supposed to 
work with a multi-topic consumer ?
----
2020-01-27 23:17:46 UTC - Roman Popenov: I know, I am trying to start the proxy 
when all the broker are already running
----
2020-01-27 23:18:01 UTC - juraj: i'm not sure if that fixes the issue, idk
----
2020-01-27 23:20:47 UTC - Roman Popenov: But I would need to connect using the 
pulsar-admin to check how many broker pods are running, it will get me 403
----
2020-01-27 23:22:40 UTC - Addison Higham: @Roman Popenov this should be fixed 
in 2.5.0 and perhaps 2.4.2? <https://github.com/apache/pulsar/pull/4921>
----
2020-01-27 23:22:44 UTC - Addison Higham: what version are you on?
----
2020-01-27 23:23:26 UTC - Addison Higham: oh perhaps I am not understanding 
properly, thought you were hitting the `/metrics` endpoint for a proxy
----
2020-01-27 23:23:29 UTC - Addison Higham: not sure on that one
----
2020-01-27 23:23:59 UTC - Roman Popenov: I think I am hitting:
<https://github.com/apache/pulsar/issues/5994>
----
2020-01-27 23:25:24 UTC - juraj: ^i reported that
----
2020-01-27 23:27:50 UTC - Ali Hamidi: @Ali Hamidi has joined the channel
----
2020-01-28 01:26:55 UTC - Roman Popenov: Yes, I can confirm that if proxy comes 
up after the brokers appear in the nslookup, the proxy doesn’t have this issue
----
2020-01-28 01:33:33 UTC - Roman Popenov: I have posted a temporary work-around 
in the issue
----
2020-01-28 01:55:04 UTC - Sijie Guo: yes. you can add an init container to do 
broker health check.
----
2020-01-28 01:55:26 UTC - Sijie Guo: you need to ack for individual partitions.
----
2020-01-28 01:56:12 UTC - Roman Popenov: I’ve done a work around in 
<https://github.com/apache/pulsar/issues/5994>. Not exactly if that is the best 
approach, but it does seem to solve the issue of having to restart the proxies 
manually
----
2020-01-28 01:57:35 UTC - Roman Popenov: Is there a better way to do the broker 
health check? I noticed that the proxy needs the broker pods running before it 
can start. If it starts before the brokers, it cannot run the pulsar-admin 
commands and the pulsar-admin returns 403
----
2020-01-28 01:57:44 UTC - Roman Popenov: So it’s a bit of a chicken and an egg…
----
2020-01-28 01:58:40 UTC - Sijie Guo: this approach should be the right 
approach. although you can change `
```-ge {{ .Values.broker.replicaCount }}```
to `-ge 1`
----
2020-01-28 01:58:49 UTC - Sijie Guo: you just need one broker to be available
+1 : Roman Popenov
----
2020-01-28 01:59:12 UTC - Roman Popenov: Will do, thanks!
----
2020-01-28 01:59:43 UTC - Sijie Guo: thanks
----
2020-01-28 02:05:35 UTC - Sijie Guo: @ravi satya durga prasad Yenugula the 
documentation about Pulsar docker image is all in pulsar website. we don’t 
spread out the documentation across different places as it is really hard for 
us to maintain and keep them consistent.

If you want to run pulsar in docker as standalone, you can check out the 
documentation at : <http://pulsar.apache.org/docs/en/standalone-docker/>

If you want to run pulsar in docker in a distributed mode, k8s is recommended. 
You can checkout the documentation: 
<http://pulsar.apache.org/docs/en/deploy-kubernetes/>
----
2020-01-28 02:30:13 UTC - Sijie Guo: @David Kjerrumgaard @Matt Mitchell 
currently the current version of presto that pulsar is using doesn’t run well 
on OpenJDK. there is already an issue filed for that: 
<https://github.com/apache/pulsar/issues/5370>

We just need to upgrade presto version. But that requires some code changes 
since there some changes in presto we need to adjust. If anyone is interested 
in helping with that issue, contributions are welcome.
----
2020-01-28 02:41:27 UTC - Eugen: Does message deduplication work for 
partitioned topics? If so, as deduplication is handled in brokers, which are 
only aware of the sequence ids they have seen, the producer would have to 
maintain and send sequence ids per partition as well. I can't find any 
documentation on this.
----
2020-01-28 02:59:06 UTC - Sijie Guo: it is only maintained per partition.
----
2020-01-28 03:09:21 UTC - Eugen: I see. Actually, thinking about it again, the 
producers do *not* need to maintain and use sequence ids per partition for this 
to work - because gaps are ok! (This is different from the "redundant producer" 
use case I'm discussing over in the dev channel, where the producer would have 
to be aware of partitions)
----
2020-01-28 03:10:54 UTC - Eugen: But that assumes that the partitioner is 
deterministic - does this hold for RoundRobin? Will the same sequence id get 
sent to the same partition?
----
2020-01-28 03:27:34 UTC - ravi satya durga prasad Yenugula: 
----
2020-01-28 03:30:51 UTC - ravi satya durga prasad Yenugula: removed `$`
----
2020-01-28 04:49:29 UTC - Pradeesh: @Pradeesh has joined the channel
----
2020-01-28 05:03:10 UTC - Pradeesh: @Sijie Guo running into an error getting 
bookie running
```04:55:13.795 [main-SendThread(localhost:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
3.951: Total time for which application threads were stopped: 0.0000476 
seconds, Stopping threads took: 0.0000265 seconds
3.952: Total time for which application threads were stopped: 0.0000762 
seconds, Stopping threads took: 0.0000143 seconds
3.952: Total time for which application threads were stopped: 0.0000559 
seconds, Stopping threads took: 0.0000227 seconds
04:55:13.799 [main-SendThread(localhost:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Socket error occurred: 
localhost/127.0.0.1:2181: Connection refused
4.953: Total time for which application threads were stopped: 0.0001144 
seconds, Stopping threads took: 0.0000211 seconds
04:55:14.902 [main-SendThread(localhost:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
04:55:14.903 [main-SendThread(localhost:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Socket error occurred: 
localhost/127.0.0.1:2181: Connection refused
5.953: Total time for which application threads were stopped: 0.0000819 
seconds, Stopping threads took: 0.0000233 seconds```
----
2020-01-28 05:03:46 UTC - Pradeesh: for some reason the bookie tries to connect 
at localhost:2181
----
2020-01-28 05:04:17 UTC - Pradeesh: have this in my configmap
```  PULSAR_PREFIX_zkServers: 
zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper,zk-3.zookeeper,zk-4.zookeeper```
----
2020-01-28 05:05:42 UTC - sindhushree: @sindhushree has joined the channel
----
2020-01-28 05:19:30 UTC - Pradeesh: looks like the file generated inside the 
container `/pulsar/conf/bookkeeper.conf` uses all defaults
----
2020-01-28 05:20:03 UTC - Pradeesh: even though this looks good
----
2020-01-28 05:20:09 UTC - Pradeesh: ```[conf/pulsar_env.sh] Applying config 
PULSAR_GC = " -XX:+UseG1GC "
[conf/pulsar_env.sh] Applying config PULSAR_MEM = 
"-Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled 
-XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis 
-XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 
-XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError 
-XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc 
-XX:G1LogLevel=finest -Xms28g -Xmx28g -XX:MaxDirectMemorySize=28g"
[conf/pulsar_env.sh] Adding config dbStorage_readAheadCacheMaxSizeMb = 1024
[conf/pulsar_env.sh] Adding config dbStorage_rocksDB_blockCacheSize = 4294967296
[conf/pulsar_env.sh] Adding config dbStorage_writeCacheMaxSizeMb = 1024
[conf/pulsar_env.sh] Adding config journalMaxSizeMB = 2048
[conf/pulsar_env.sh] Adding config statsProviderClass = 
org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider
[conf/pulsar_env.sh] Adding config useHostNameAsBookieID = true
[conf/pulsar_env.sh] Adding config zkServers = 
zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper,zk-3.zookeeper,zk-4.zookeeper
[conf/bkenv.sh] Adding config dbStorage_readAheadCacheMaxSizeMb = 1024
[conf/bkenv.sh] Adding config dbStorage_rocksDB_blockCacheSize = 4294967296
[conf/bkenv.sh] Adding config dbStorage_writeCacheMaxSizeMb = 1024
[conf/bkenv.sh] Adding config journalMaxSizeMB = 2048
[conf/bkenv.sh] Adding config statsProviderClass = 
org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider
[conf/bkenv.sh] Adding config useHostNameAsBookieID = true
[conf/bkenv.sh] Adding config zkServers = 
zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper,zk-3.zookeeper,zk-4.zookeeper
[conf/pulsar_env.sh] Applying config PULSAR_GC = " -XX:+UseG1GC "
[conf/pulsar_env.sh] Applying config PULSAR_MEM = 
"-Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled 
-XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis 
-XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 
-XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError 
-XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc 
-XX:G1LogLevel=finest -Xms28g -Xmx28g -XX:MaxDirectMemorySize=28g"
[conf/pulsar_env.sh] Applying config PULSAR_GC = " -XX:+UseG1GC "
[conf/pulsar_env.sh] Applying config PULSAR_MEM = 
"-Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled 
-XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis 
-XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 
-XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError 
-XX:+PerfDisableSharedMem -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc 
-XX:G1LogLevel=finest -Xms28g -Xmx28g -XX:MaxDirectMemorySize=28g"```
----

Slack digest for #general - 2020-01-28

Reply via email to