Slack digest for #general - 2020-05-02

Apache Pulsar Slack Sat, 02 May 2020 02:11:22 -0700

2020-05-01 09:31:47 UTC - Raffaele: Thanks.
----
2020-05-01 12:45:36 UTC - Tom Greenwood: I just a had kubernetes autoscaling + 
plusar explosion.


I set my pod up with these commands

```consumer = client.subscribe("<persistent://mytenant/mynamespace/mytopic>", 
                             "mysubscription", 
                             initial_position=pulsar.InitialPosition.Earliest,
                             consumer_type=pulsar.ConsumerType.Shared)

producer = client.create_producer(topic = 
"<persistent://mytenant/mynamespace/myothertopic>",
                                  block_if_queue_full=True,
                                  batching_enabled=True,
                                  batching_max_publish_delay_ms=10)


producer.send_async(json.dumps(message.dict()).encode(), send_callback)```
then set the deployment to autoscale...

I just got loads of errors like this:

```WARN  ClientConnection:1313 | [192.168.86.114:49528 -&gt; 
192.168.57.169:6650] Forcing connection to close after keep-alive timeout

ERROR ClientConnection:981 | [192.168.36.29:44670 -&gt; 192.168.45.107:6650] 
Got invalid producer Id in closeProducer command

INFO  HandlerBase:130 | 
[<persistent://mytenant/mynamespace/myothertopic-partition-4>, pulsar-7-83] 
Schedule reconnection in 0.1 s\n"

INFO  ConnectionPool:75 | Deleting stale connection from pool for 
<pulsar://pulsar-broker-1.pulsar-broker.test.svc.cluster.local:6650> use_count

INFO  HandlerBase:53 | 
[<persistent://mytenant/mynamespace/mytopic-partition-3>, mysubscription, 3] 
Getting connection from pool

ERROR ClientConnection:667 | [192.168.36.29:44666 -&gt; 192.168.45.107:6650] 
Connection already disconnected

INFO  ClientConnection:235 | [192.168.36.29:44666 -&gt; 192.168.45.107:6650] 
Destroyed connection```
Can anyone help me? Not sure what I did wong.. my pods briefly blow up to their 
max autoscaling, then gradually died back down to 1 replica and now are just 
sitting there mostly erroring..

Possibly related to this <https://github.com/apache/pulsar/issues/3630>?

POSSIBLY RESOVLED: My topics were partitioned. I got rid of the partitions and 
it seemed to resolved the problem.
----
2020-05-01 16:15:14 UTC - Guillaume Audic: @Guillaume Audic has joined the 
channel
----
2020-05-01 16:26:17 UTC - Guillaume Audic: Hi, we have some trouble with the 
memory usage of broker in kubernetes. We have set a memory limit to 1Go, but 
when we inspect stats of pods, we have :
• Limits :1024Mi
• Cache : 700Mi
• Usage : 400Mi
• Cache + Usage : 1200 &gt; 1024 -&gt; Pod killed by OOM
With kubectl top pod, we have 400Mi (Usage - Cache)
Do you know why containers for broker use a lot of cache ?
----
2020-05-01 16:31:57 UTC - chris: Did you set the PULSAR_MEM env variable?
----
2020-05-01 16:32:12 UTC - chris: by default the broker will use 6GB of memory 
<https://github.com/apache/pulsar/blob/master/conf/pulsar_env.sh#L45>
----
2020-05-01 16:32:22 UTC - Guillaume Audic: ```PULSAR_MEM: &gt;
  "
  -Xms128m -Xmx256m -XX:MaxDirectMemorySize=128m
  -Dio.netty.leakDetectionLevel=disabled
  -Dio.netty.recycler.linkCapacity=1024
  -XX:+ParallelRefProcEnabled
  -XX:+UnlockExperimentalVMOptions
  -XX:+DoEscapeAnalysis
  -XX:ParallelGCThreads=4
  -XX:ConcGCThreads=4
  -XX:G1NewSizePercent=50
  -XX:+DisableExplicitGC
  -XX:-ResizePLAB
  -XX:+ExitOnOutOfMemoryError
  -XX:+PerfDisableSharedMem
  "
PULSAR_GC: &gt;
  "
  -XX:+UseG1GC
  -XX:MaxGCPauseMillis=10
  "```

----
2020-05-01 16:34:01 UTC - Guillaume Audic: Compare to bookkeeper Cache memory 
is really big
----
2020-05-01 16:34:50 UTC - Guillaume Audic: Same memory limits &amp; config for 
bookkeeper and we have :
Cache size : 716Kb
----
2020-05-01 16:35:39 UTC - chris: are you producing and consuming on the brokers?
----
2020-05-01 16:35:40 UTC - Guillaume Audic: Cache size for broker: 700 000Kb
----
2020-05-01 16:35:51 UTC - Guillaume Audic: No client for the moment
----
2020-05-01 16:36:07 UTC - Guillaume Audic: Just bootsrap the cluster and 
nothing else
----
2020-05-01 16:36:58 UTC - chris: and the broker is crashing?
----
2020-05-01 16:37:25 UTC - Guillaume Audic: Sometimes yep cause memory limits 
and killed by OOM
----
2020-05-01 16:38:06 UTC - Guillaume Audic: But if we set a limite to 2.5G no 
problem but bigger size for the cache
----
2020-05-01 16:40:15 UTC - chris: if you log into the broker pod and do a ps aux 
can you get the command?
it seems like it should not OOM
----
2020-05-01 16:40:56 UTC - chris: what happens if you bump the direct memory to 
XX:MaxDirectMemorySize=256m
----
2020-05-01 16:41:15 UTC - Guillaume Audic: memory stats of the container :
cache 0
rss 0
rss_huge 0
shmem 0
mapped_file 0
dirty 0
writeback 0
swap 0
pgpgin 0
pgpgout 0
pgfault 0
pgmajfault 0
inactive_anon 0
active_anon 0
inactive_file 0
active_file 0
unevictable 0
hierarchical_memory_limit 2147483648
hierarchical_memsw_limit 9223372036854771712
total_cache 1227272192
total_rss 320323584
total_rss_huge 140509184
total_shmem 0
total_mapped_file 249856
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 412412
total_pgpgout 68818
total_pgfault 123892
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 320323584
total_inactive_file 1227235328
total_active_file 36864
total_unevictable 0
----
2020-05-01 16:41:21 UTC - chris: or even 512M
----
2020-05-01 16:42:24 UTC - Guillaume Audic: I try
----
2020-05-01 16:43:40 UTC - chris: might want to look into this as well. 
<https://medium.com/adorsys/jvm-memory-settings-in-a-container-environment-64b0840e1d9e>
----
2020-05-01 16:47:21 UTC - Guillaume Audic: Not better with 512 
XX:MaxDirectMemorySize= , same cache size, resulting a OOM
----
2020-05-01 16:47:58 UTC - Guillaume Audic: I know all of this for java in 
container environment but, we have some limitations due to the java version 
used in pulsar 1.8
----
2020-05-01 16:49:57 UTC - Guillaume Audic: I already try to add some options as 
:
```-XX:+UseCGroupMemoryLimitForHeap
-XX:MaxRAMFraction=1```
But does't affect the OOM
----
2020-05-01 16:50:22 UTC - chris: can you use more than 1GB on the container for 
the broker?
----
2020-05-01 16:51:47 UTC - Guillaume Audic: Yep with 2Gi it's ok but still a 
realy big amount of cache usage in the container :
----
2020-05-01 16:52:46 UTC - Guillaume Audic: As you can see for 2Gi limits, i get 
1.143 of cache and the java process consume only the difference between usage 
and cache = 347M
----
2020-05-01 16:53:10 UTC - chris: i think the broker will preallocate memory up 
front
----
2020-05-01 16:53:47 UTC - Guillaume Audic: Yes but which memory to use the cache
----
2020-05-01 16:56:55 UTC - Guillaume Audic: I try this option to set a smaller 
size of the cache :
managedLedgerCacheSizeMB:256
no result
----
2020-05-01 17:06:25 UTC - chris: are you trying to lower the cache usage to 
have more memory for the java process?
----
2020-05-01 17:06:46 UTC - Guillaume Audic: Yep
----
2020-05-01 17:08:44 UTC - chris: i'm not sure that matters the java process 
will make use of the direct memory and cached memory
----
2020-05-01 17:53:51 UTC - Ruian: `-XX:+UseContainerSupport 
-XX:InitialRAMPercentage=40.0 -XX:MinRAMPercentage=20.0 
-XX:MaxRAMPercentage=80.0` 
----
2020-05-01 17:54:40 UTC - Ruian: maybe you can try to replace all `-Xms -Xmx 
-XX:MaxDirectMemorySize` with the above string
----
2020-05-01 18:13:22 UTC - Jon Cordeiro: @Jon Cordeiro has joined the channel
----
2020-05-01 18:32:42 UTC - seungho: @seungho has joined the channel
----
2020-05-01 18:35:48 UTC - Guillaume Audic: Thxs I will try this.
----
2020-05-01 18:48:04 UTC - Ruian: and don't forget to set resource request limit 
on memory of container spec.
----
2020-05-01 19:06:45 UTC - Ruian: Otherwise the UseContainerSupport will have no 
effect, because it relies on the cgroup memory limit.
----
2020-05-01 19:54:10 UTC - Guillaume Audic: Nop better, but do you have an idea 
about this high consumption of cache memory  ?
----
2020-05-01 20:24:48 UTC - Guillaume Audic: Is there a lot of I/O in the broker 
which can write a lot of data, this can explain the cache memory of the 
containers ?
----
2020-05-01 20:25:02 UTC - Guillaume Audic: Sorry for my bad english (french)
----
2020-05-01 21:10:06 UTC - Deepak Sah: @Deepak Sah has joined the channel
----
2020-05-01 21:18:39 UTC - Sam Leung: Thanks!
----
2020-05-01 21:41:19 UTC - Guillaume Audic: After some tests, it appers that 
there is some I/O in the /tmp directory which inscrease the cache memory of the 
containers. OOM killing use the memory used which is calculate from the memory 
used by the container and the cache memory. Cache memory is used to cache some 
file modifications in directory which is not mapped to a directory.
My containers are now alive with no OOM killing after mapped to emptydir some 
directories :
```volumeMounts:
  - name: tmp
    mountPath: /tmp
volumes:
- name: tmp
  emptyDir: {} ```

----
2020-05-01 21:51:47 UTC - Guillaume Audic: During the startup of the containers 
there is 1.2G data created under /tmp with the name pulsar-nar.
Is there someone who know what is this directory ?
----
2020-05-02 02:40:13 UTC - Ruian: <https://pulsar.apache.org/docs/en/io-debug/>
it seems that it is related to io connector.

<https://github.com/apache/pulsar/blob/17e22d1eb19e763b2e89dfd5bdf0b7479653a3d9/pulsar-common/src/main/java/org/apache/pulsar/common/nar/NarClassLoader.java#L136>
----
2020-05-02 04:10:50 UTC - Hiroyuki Yamada: I’m trying to run Pulsar in 
production but wondering how Pulsar experts take backup for BookKeeper data 
since there seems not built-in backup/restore feature in Pulsar/Bookkeeper.
Let’s say, 1 bookie disk is corrupted so it needs to be replaced with new disk. 
What is the best/fastest way to  the bookie in such case ?

There is a Recovery feature in BookKeeper but it is not feasible if data is big 
(, which is the most of cases) or new bookie node joins.
Could anyone help me on this please ?
----
2020-05-02 07:07:11 UTC - Franck Schmidlin: I'll answer my own question with 
google. 

_Creating many topics in Pulsar is a much cheaper operation compared to other 
messaging systems. Topics can be explicitly deleted, or they are automatically 
deleted when all producers and consumers are disconnected and all subscriptions 
on the topic were deleted as well_

The return address pattern simply requires the request message to specify a 
unique, specifically created return address topic to be used for that one 
conversation. 

The requester indicates the return address as part of the request message or 
its metadata, and start waiting for a response. The responder(s) retrieve the 
return address topic from the message or its metadata and produce their 
response on that topic. 

Pulsar will automatically delete  the topic once the requester and responders 
have disconnected..
----
2020-05-02 08:58:09 UTC - JG: Hello guys, a quick question regarding Pulsar and 
Zookeeper, can we use the Zookeper instance to store other data ? ( from other 
apps) or we should have another zookeeper instance in order to avoid problems 
with Pulsar ?
----
2020-05-02 09:03:59 UTC - Franck Schmidlin: I don't know, am a newbie, but have 
seen several articles which treat the zk cluster as a commodity shared between 
apache ecosystem components.
----

Slack digest for #general - 2020-05-02

Reply via email to