Slack digest for #general - 2019-11-19

Apache Pulsar Slack Tue, 19 Nov 2019 01:11:50 -0800

2019-11-18 09:48:19 UTC - Fabien LD: @Fabien LD has joined the channel
----
2019-11-18 09:52:44 UTC - Pedro Cardoso: Hello, how are pulsar functions 
healthchecked? Is there a way to check if a given pulsar function is degraded 
or down and if so to apply some event replay logic (recovery logic)
----
2019-11-18 10:51:31 UTC - Alberto Refaldi: @Alberto Refaldi has joined the 
channel
----
2019-11-18 10:51:53 UTC - Pedro Cardoso: Just found that state is not enabled 
for the pulsar-functions, how can I enable them?
```
geImpl@2591bdd], 
failFunction=org.apache.pulsar.functions.source.PulsarSource$$Lambda$109/1729347078@400c6103,
 
ackFunction=org.apache.pulsar.functions.source.PulsarSource$$Lambda$108/1713078304@45d976d8)
java.lang.IllegalStateException: State is not enabled.
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:444) 
~[com.google.guava-guava-21.0.jar:?]
        at 
org.apache.pulsar.functions.instance.ContextImpl.ensureStateEnabled(ContextImpl.java:262)
 ~[org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
        at 
org.apache.pulsar.functions.instance.ContextImpl.getCounter(ContextImpl.java:289)
 ~[org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
        at RollingSum.process(RollingSum.java:32) ~[?:?]
        at RollingSum.process(RollingSum.java:24) ~[?:?]
        at 
org.apache.pulsar.functions.instance.JavaInstance.handleMessage(JavaInstance.java:63)
 ~[org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
        at 
org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:259)
 [org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
```
----
2019-11-18 11:35:59 UTC - Pedro Cardoso: <!here> does anyone know how to enable 
the table service in kubernetes deployment?
----
2019-11-18 12:23:50 UTC - leonidv: Hi all! I'm interested in the message's 
retention policy. As I can read in the documentation 
(<https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#retention-policies>:
&gt;    immediately delete all messages that have been acknowledged on every 
subscription, and
&gt;    persistently store all unacknowledged messages in a backlog.


As I understand it's default retention policy in Apache Pulsar 
(defaultRetentionTimeInMinutes=0 and defaultRetentionSizeInMB=0).
I've done the next scenario:
1. Had create new producer with new topic ABC and send m1
2. Had create new consumer (Cons1) and it subscription (S1) to ABC 
(SubscriptionInitialPosition.early).
3. Consumer read message m1
4. Had create new consumer (Cons1) and it subscription (S1) to ABC 
(SubscriptionInitialPosition.early).

What I expect due to the documentation: Cons2 don't receive m1, because Apache 
Pulsar "immediately delete all messages that have been acknowledged on every 
subscription"
What actual: Cons2 recieve m2.

What did I miss?
----
2019-11-18 12:54:19 UTC - Penghui Li: Data is not delete immediately, it 
depends the ledger rollover, because the current ledger is active, we can’t 
delete it this time.
----
2019-11-18 13:16:07 UTC - leonidv: Where can I read about this?
----
2019-11-18 14:13:24 UTC - Rahul Vashishth: <#C5Z4T36F7|general> has anyone 
deployed pulsar on kubernetes cluster? Please look into this issue 
<http://github.com/apache/pulsar/issues/5623|github.com/apache/pulsar/issues/5623>
----
2019-11-18 14:14:14 UTC - Rahul Vashishth: I am unable to access dashboard UI 
via nginx ingress controller
----
2019-11-18 15:28:59 UTC - Sunil Sattiraju: In Kafka I was using groupByKey and 
aggregate api to collect values of the same key into a list. Is there a similar 
concept in Pulsar?
----
2019-11-18 17:28:53 UTC - David Kjerrumgaard: @Sunil Sattiraju Look at the 
key-shared subscription type.
----
2019-11-18 18:39:05 UTC - Martin Kunev: Hello, I have an issue setting allowed 
clusters for a tenant with pulsar 2.4.0. It appears that the server is 
reporting to have set the allowed clusters but is not setting them:

$ curl -i -X PUT '<http://example.com/admin/v2/namespaces/mytenant/myns>' -H 
'Content-Type: application/json' --data '{"replication_clusters": 
["fra","nyc","sgp"]}'
HTTP/1.1 403 Forbidden
{"reason":"Cluster [fra] is not in the list of allowed clusters list for tenant 
[mytenant]"}

$ curl -i -X POST '<http://example.com/admin/v2/tenants/mytenant>' -H 
'Content-Type: application/json' --data '{"adminRoles": ["admin"], 
"allowed_clusters": ["fra","nyc","sgp"]}'
HTTP/1.1 204 No Content

$ pulsar-admin tenants get 'mytenant'
{
  "adminRoles" : [ "admin" ],
  "allowedClusters" : [ ]
}

What can be the cause of this and how can I fix it?
----
2019-11-18 18:41:52 UTC - rkandoi: @rkandoi has joined the channel
----
2019-11-18 22:24:43 UTC - Oleg Kozlov: Hi all, is there any way to control 
whether Pulsar Proxy would uses openssl or falls back to jvm when TLS is 
enabled? Based on the following section in the docs it should use openssl,  in 
our case we do have openssl installed by it doesn't seem like pulsar is using 
it: 
<https://pulsar.apache.org/docs/en/security-tls-transport/#tls-protocol-version-and-cipher>
----
2019-11-18 22:37:12 UTC - Oleg Kozlov: Also, is there a way to verify whether 
Pulsar is using openssl or JDK implementation?
----
2019-11-19 00:36:18 UTC - Penghui Li: There is no more documents to describe 
the ledger deletion. thanks for the feedback, we will try to add it later.
----
2019-11-19 01:13:20 UTC - Sunil Sattiraju: Thanks for replying @David 
Kjerrumgaard Are there any example code i can refer to?
Below is what i am currently doing in kafka streams.
----
2019-11-19 01:13:23 UTC - Sunil Sattiraju: ```val groupedByKeyStream: Unit = 
builder.stream[String, Map[String, Set[String]]](sampleTopicWithKey)
    .groupByKey
    .aggregate(initializer = Map[String, Set[String]]())(aggregator = (_, b: 
Map[String, Set[String]], agg: Map[String, Set[String]]) =&gt; agg ++ b)
    .toStream```
----
2019-11-19 01:15:07 UTC - Sunil Sattiraju: in pulsar, do i need to store the 
aggregate value in state store and retrieve & append when i see a new value for 
the key
----
2019-11-19 02:10:41 UTC - tuteng: Please try use allowedClusters, not is 
allowed_clusters in json data
----
2019-11-19 02:53:19 UTC - cequencer: @cequencer has joined the channel
----
2019-11-19 02:53:24 UTC - Penghui Li: Hi, @leonidv, i have add a issue to track 
the document for this problem <https://github.com/apache/pulsar/issues/5693>.  
If you have some idea please let me know.  comment at the the issue is more 
recommended here.
Thanks for your feedback.
----
2019-11-19 05:52:48 UTC - Sunil Sattiraju: i understand from  key-shared 
subscription type all the message with the same key will be delivered to 1 
consumer, but in my case i will not know the key before hand and hence i will 
need to aggregate all the values for the same key and send it downstream for 
further processing/evaluation.
----
2019-11-19 06:07:38 UTC - Matteo Merli: @Oleg Kozlov There's currently no way 
to configure it, though the TLS provider is printed when broker/proxy starts:
----
2019-11-19 06:07:39 UTC - Matteo Merli: <http://log.info|log.info>("Started 
Pulsar Broker TLS service on {} - TLS provider: {}", 
listenChannelTls.localAddress(),
                        SslContext.defaultServerProvider());
----
2019-11-19 06:08:20 UTC - Matteo Merli: uhm, actually that's just in broker 
code, not in proxy
----
2019-11-19 06:09:12 UTC - Matteo Merli: In any case, the logic is in Netty 
code. If OpenSSL is available, it will be used, otherwise it falls back to JDK
----
2019-11-19 06:30:25 UTC - youzipi: @youzipi has joined the channel
----
2019-11-19 08:17:02 UTC - Logan B: In the Yahoo! engineering blog, it mentions 
that migrating topics (I assume this includes broker failure) was on the order 
of 10s. Under "Future Improvements" it mentions that it's currently being 
worked on to reduce this to <1s.  What's the current state?  I haven't yet seen 
any other reference around performance of broker failure in a production 
environment.
<https://yahooeng.tumblr.com/post/150078336821/open-sourcing-pulsar-pub-sub-messaging-at-scale>
----
2019-11-19 08:18:15 UTC - t3hnar: @t3hnar has joined the channel
----
2019-11-19 08:42:52 UTC - Fabien LD: Hi.
We have pulsar deployed on kubernetes with the default 
`apachepulsar/pulsar:2.4.1` docker image. That image comes with OpenJDK 8u212 
(from base docker image `openjdk:8-jdk-slim`) and this is causing us issues 
with bookeeper.
Indeed, even if bookeeper heap stays quiet and behaves within the indicated 
boundaries (7GB), the JVM process of bookie instances is using more and more 
memory until the maximum allocated to the POD (10GB) when the pulsar cluster 
starting to face issues and stops responding. Sometime some memory is released 
but not much and it keeps growing and it does not take much to fail and stop 
responding.

We have something similar with the local zookeeper (As opposed to the global 
one we have for geo replication) but when reaching the maximum some memory is 
released.

My question is then multiple:

• Is this version of open JDK OK or should we think about building our own 
image with some newer JVM (I see some support for Java 11 in version 2.30 of 
Pulsar)?
• Is this some kind of leak from bookkeeper (like a leak of file descriptors 
with some caching in system memory. It looks like the docker image we use comes 
with version 4.9.2) and if so what version should we try? 
• Could this be some "normal" system memory consumption of the JVM because of 
high number (I assume) of file descriptors opened? And for example should we 
assign less to JVM (for example 4GB instead of 7GB currently) to assign more to 
system (for example 6GB instead of 3GB currently)? What are the typical figures 
for memory, I could not find anything relevant.

For information, here is the configuration of our bookkeepers:

```  PULSAR_MEM: "\"-Dio.netty.leakDetectionLevel=disabled 
-Dio.netty.recycler.linkCapacity=1024 -Xms7g -Xmx7g 
-XX:MaxDirectMemorySize=1g\""
  PULSAR_GC: "\" -XX:+UseG1GC -XX:MaxGCPauseMillis=10 
-XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions 
-XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 
-XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC 
-XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest \""
  dbStorage_writeCacheMaxSizeMb: "256" # Write cache size (direct memory)
  dbStorage_readAheadCacheMaxSizeMb: "256" # Read cache size (direct memory)
  dbStorage_rocksDB_blockCacheSize: "268435456"
  journalMaxSizeMB: "1024"
  zkServers: zk-0.zookeeper,zk-1.zookeeper,zk-2.zookeeper
  configurationStoreServers: 
global-zk-us-0.global-zookeeper,global-zk-us-1.global-zookeeper,global-zk-us-2.global-zookeeper
  statsProviderClass: 
org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider
  useHostNameAsBookieID: "true"```
----
2019-11-19 08:50:27 UTC - Fabien LD: Note : we are currently (over the last ~12 
hours) trying OpenJDK 8u232 which is the "latest" available via apt-get in 
`apachepulsar/pulsar:2.4.1` docker image (default JDK used is 8u212) and so far 
it looks way better. So this would reinforce hypothesis of an issue with JDK 
version. To be confirm when we will reach again peak of traffic.
----
2019-11-19 08:52:02 UTC - Fabien LD: With OpenJDK 8u232
----
2019-11-19 09:01:16 UTC - Fabien LD: I have removed the deployment of dashboard 
since it was consuming quite a lot of memory, it was not working well with 
georeplication (some links were broken because of non unicity of some entries, 
...). I did so on pulsar 2.3.0, noticed it was a bit better on a later version 
(I do not remember if it was 2.3.x or 2.4.0). I did not even look at it on 2.4.1
----
2019-11-19 09:02:01 UTC - Fabien LD: But I remember having your issue and 
making a work around
----
2019-11-19 09:06:01 UTC - Rahul Vashishth: What was your workaround? Are you 
using some other dashboard?
----
2019-11-19 09:08:17 UTC - Fabien LD: Just published it to github to share with 
the community (few people would have found the answer here in Slack)
----
2019-11-19 09:08:32 UTC - Fabien LD: We are not using any dashboard at the 
moment
----

Slack digest for #general - 2019-11-19

Reply via email to