Slack digest for #general - 2018-12-13

Apache Pulsar Slack Thu, 13 Dec 2018 01:11:57 -0800

2018-12-12 09:26:16 UTC - Maarten Tielemans: Morning all, I managed to get my 2 
node Pulsar cluster setup yesterday. However, when doing an initial test on 
latency, latency seems to have increased rather than decreased.
----
2018-12-12 09:26:29 UTC - Maarten Tielemans: 
----
2018-12-12 09:27:24 UTC - Maarten Tielemans: I have configured my broker as 
followed


```
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=2

# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=2

# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=1
```
----
2018-12-12 09:28:01 UTC - Maarten Tielemans: For reference, these were the 
numbers I hit Friday with a standalone run over a longer duration

`09:44:03.405 [Thread-2] INFO  org.apache.pulsar.testclient.PerformanceProducer 
- Aggregated latency stats --- Latency: mean:   3.335 ms - med:   3.216 - 
95pct:   4.328 - 99pct:   7.007 - 99.9pct:  15.858 - 99.99pct: 128.418 - 
99.999pct: 261.110 - Max: 278.307`
----
2018-12-12 09:28:36 UTC - Maarten Tielemans: Any obvious things I am doing 
wrong?
----
2018-12-12 09:29:39 UTC - Romain Castagnet: @Romain Castagnet has joined the 
channel
----
2018-12-12 09:34:03 UTC - Romain Castagnet: Hi, before next release 2.3, do you 
have a snapshot/beta repository ?
----
2018-12-12 10:02:54 UTC - Ivan Kelly: @Romain Castagnet for the client or 
server? there's a snapshot maven repo anyhow
----
2018-12-12 10:06:46 UTC - Romain Castagnet: @Ivan Kelly it's for server and we 
need the last tar.gz snapshot, where can I find this ?
----
2018-12-12 10:42:28 UTC - Ivan Kelly: @Romain Castagnet I'm not sure if we 
generate one. why can't you build master directly yourself?
----
2018-12-12 10:42:54 UTC - Ivan Kelly: @Maarten Tielemans what has the latency 
increased from?
----
2018-12-12 10:43:09 UTC - Ivan Kelly: ah, sorry, didn't read fully
----
2018-12-12 10:44:43 UTC - Ivan Kelly: how are you running the benchmark?
----
2018-12-12 10:47:14 UTC - Romain Castagnet: thanks @Ivan Kelly it's already 
what we do
----
2018-12-12 10:47:29 UTC - Ivan Kelly: i don't think there's an official one
----
2018-12-12 10:48:32 UTC - Romain Castagnet: maybe
----
2018-12-12 10:56:55 UTC - Maarten Tielemans: hi @Ivan Kelly, in both cases I 
ran the client on (one of) the nodes. the nodes are also setup in a placement 
group
----
2018-12-12 10:58:47 UTC - Ivan Kelly: two node cluster has separate bookie and 
pulsar? or are they on the same nodes?
----
2018-12-12 11:02:01 UTC - Maarten Tielemans: each node runs a zookeeper, bookie 
and broker (similar to the standalone)
----
2018-12-12 11:07:23 UTC - Ivan Kelly: but each is running individually? as in, 
their own process?
----
2018-12-12 11:08:01 UTC - Ivan Kelly: standalone has journalSyncData=false for 
bookkeeper, while running a bookie has journalSyncData=true (the safer setting).
----
2018-12-12 12:25:07 UTC - Maarten Tielemans: yes, I am running zookeeper, 
bookie and broker as separate (daemon) processes
----
2018-12-12 12:27:57 UTC - Maarten Tielemans: I realise the 
journalSyncData=true/false change ; but considering that I am using instances 
with NVMe disks, I am not sure it would explain the difference. I can try with 
the setting on false
----
2018-12-12 12:40:57 UTC - Ivan Kelly: with false, it skips the fsync. fsync is 
a syscall so you have a jump from userspace to kernel space
----
2018-12-12 13:00:14 UTC - Maarten Tielemans: I added the journalSyncData=false 
setting to my conf/bookkeeper.conf, restarted both bookkeepers and brokers and 
re-ran the performance test. Results are more or less the same
----
2018-12-12 13:00:30 UTC - Maarten Tielemans: 
----
2018-12-12 13:32:29 UTC - Mike Card: Hey @David Kjerrumgaard last night I tried 
reducing the number of parallel tasks from 2 to 1 and even added 
synchronized(){} blocks around the access to the static variables in the custom 
serializer to rule out any kind of concurrency issue and none of that made a 
difference. The logging showed the producer sending 64 bytes and the consumer 
getting 64 bytes so I have no idea what is making this happen. I might try 
re-writing the serializer to not use the Composite() type, baffld as to why we 
see this behavior under heavy load in Pulsar but not Kafka.
----
2018-12-12 14:02:24 UTC - Ivan Kelly: median and p95 are the same (slightly 
better in fact), it's the p99 where things go to hell
----
2018-12-12 14:08:39 UTC - Ivan Kelly: hmm, messages are quite large. I'm going 
to guess the issue is some file rolling over. there's local spikes and a large 
spike every minute
----
2018-12-12 14:19:17 UTC - Shalin: Are there plans to support submitting 
multi-file pulsar functions? I,e, maybe accepting a python module instead of a 
py file.
----
2018-12-12 14:24:59 UTC - Maarten Tielemans: @Ivan Kelly - I ran `iostat -x 1` 
on 1 node and re-ran the pulsar-perf test on the other. At times, I do see 
(massive) spikes in wkB/s. Normally, I expect 300kB*60 (=18000kB/s), I see 
spikes up to 250 000 kB/s. Any way to pinpoint/resolve this? I have Journal and 
Ledger on the same disk.
----
2018-12-12 14:25:09 UTC - Maarten Tielemans: ```
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    2.00     0.00     5.50     5.50     
0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00  302.00     0.00 17978.00   119.06     
0.03    0.15    0.00    0.15   0.08   2.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.06    0.00    8.59   29.80    0.00   55.56

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00 2026.00     0.00 250402.50   247.19   
163.13   37.98    0.00   37.98   0.30  61.60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.04    0.00    1.55   21.76    0.00   75.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00  472.00     0.00 47229.00   200.12    
76.40  344.76    0.00  344.76   0.96  45.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.02    0.00    1.02    0.00    0.00   97.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
0.00    0.00    0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00  304.00     0.00 18012.00   118.50     
0.04    0.18    0.00    0.18   0.08   2.40
```
----
2018-12-12 14:48:58 UTC - Sijie Guo: a wheel file package is supported in the 
upcoming 2.3.0 release. it requires some documentations. @Sanjeev Kulkarni or 
@Jerry Peng can help on that.
----
2018-12-12 14:49:57 UTC - Ivan Kelly: journal and ledger are same disk for 
standalone too.
----
2018-12-12 14:53:41 UTC - Maarten Tielemans: I could check the disk activity 
with standalone too for comparison, but would take a bit of time
----
2018-12-12 14:56:18 UTC - Olivier Chicha: yes it is a possibility that I 
envision
I have never played with debezium,
I was also envisionning to use directly mysql-binlog-connector-java
my only concern is about the added load and latency that this would introduce 
compare to the mecanism we already have in place
We will most probably give it a try, as it definitively simplifies the concepts 
and the implementation.
----
2018-12-12 15:02:51 UTC - Ivan Kelly: what class machine are you using for 
these tests?
----
2018-12-12 15:03:41 UTC - Maarten Tielemans: 2 i3.large instances with the 
local NVMe disks
----
2018-12-12 15:07:45 UTC - Matteo Merli: @Maarten Tielemans can you verify that 
you are indeed writing to the nvme disks and not the VM local disk? The numbers 
you’re getting look more like that. 

The spikes in the IOps are happening when the write cache gets flushed. That’s 
normal and expected. 
----
2018-12-12 15:12:13 UTC - Sijie Guo: @Matteo Merli: I think the 3 ms comes from 
1 ms batching at client side and 1 ms batching at journal group commit. that’s 
kind of expected. because he is sending requests at 60 messages/sec. @Maarten 
Tielemans: you can do followings:

- add `--batch-time-window 0` in your perf command
- set `journalFlushWhenQueueEmpty` to true in your `conf/bookkeeper.conf`
----
2018-12-12 15:13:30 UTC - Ivan Kelly: none on this explains why standalone 
doesn't have this issue though
----
2018-12-12 15:17:00 UTC - Sijie Guo: I think I suggested the same thing for his 
standalone setup
----
2018-12-12 15:21:11 UTC - Maarten Tielemans: @Matteo Merli I only see 
(relevant) disk activity on the nvme0n1 disk, so that seems correct. iostat 
shows an activity/second of 18 000 kB/s ; which seems correct for the number of 
messages (60) and their respective size (300k). Apart from writing those 
messages, where would the disk IO spike (~250kk) come from?

@Sijie Guo I am running bin/pulsar-perf with --batch-time-window 0 ; did not 
yet set journalFlushWhenQueueEmpty=true (will do so now)
----
2018-12-12 15:22:28 UTC - Ivan Kelly: @Maarten Tielemans each message gets 
written to the journal, and then the ledger disk. the spike is the ledger disk 
write. it flushes every 60s
----
2018-12-12 15:24:21 UTC - Sijie Guo: @Maarten Tielemans I would suggest you set 
journalFlushWhenQueueEmpty to true given you request rate is so low.
----
2018-12-12 15:32:47 UTC - Maarten Tielemans: Some numbers for a setup with 
journalFlushWhenQueueEmpty=true on both brokers. Hitting lower mean and med's ; 
but spike's persist
----
2018-12-12 15:32:56 UTC - Maarten Tielemans: 
----
2018-12-12 15:42:35 UTC - Christophe Bornet: I have seen that PIP25 was 
implemented. Does it allow to connect Pulsar to an OIDC server such as Okta or 
Keycloak ?
----
2018-12-12 15:58:06 UTC - Matteo Merli: @Christophe Bornet not at this point, 
though the intention is to have a pluggable source on the backend 
----
2018-12-12 16:16:53 UTC - David Kjerrumgaard: @Mike Card Thanks for the update. 
That is strange behavior
----
2018-12-12 16:17:54 UTC - Mike Card: Yes it is. I am taking today off and 
tomorrow I’ll resume looking at this
----
2018-12-12 16:44:54 UTC - David Kjerrumgaard: At least we have isolated the 
issue to the behavior of the serializer with Pulsar, as you are both reading 
and writing 64 bytes on the producer and consumer side, correct?
----
2018-12-12 16:53:47 UTC - Mike Card: Yes, I had originally thought the messages 
were being truncated by my tests say that is not the case
----
2018-12-12 18:02:58 UTC - Matteo Merli: That was too early in the morning.. I 
hadn’t seen the iostat output above :slightly_smiling_face:
----
2018-12-12 18:05:55 UTC - Matteo Merli: @Maarten Tielemans These numbers are 
not consistent with what should be expected from NVMe disks.

Can you check the bookie stats which will have the complete latencies breakdown?

Eg: please post the the output of `<http://BOOKIE:8000/metrics>`
----
2018-12-12 20:54:46 UTC - Grant Wu: Is it possible to have a Pulsar function 
publish to an arbitrary number of output topics?
----
2018-12-12 21:02:23 UTC - Sanjeev Kulkarni: The question is how do you decide 
which topic the output should go to. Currently this can be only achieved by 
explictly using context.publish() method inside the function
----
2018-12-12 21:03:06 UTC - Grant Wu: Hrm, assume that we can determine the topic
----
2018-12-12 21:41:30 UTC - Grant Wu: Also we would need to subscribe to a 
multi-topic subscription :confused:
----
2018-12-12 21:41:33 UTC - David Kjerrumgaard: We would also have to assume that 
all of the "target" topics already exist and that the function has permission 
to write to them.....as they won't be created at function creation time.
----
2018-12-12 21:41:49 UTC - Grant Wu: Oh, hrm…
----
2018-12-12 21:52:10 UTC - Grant Wu: Yeah, I had a feeling this was unlikely to 
work, and now I have some evidence…
----
2018-12-12 21:56:33 UTC - Matteo Merli: &gt; I don’t think it’s a huge problem, 
if everything else works as expected, meaning the client will get a transient 
failure and recover.

You can have multiple topics as input for a functions
----
2018-12-12 21:57:54 UTC - Grant Wu: er, why did you quote that
----
2018-12-12 21:58:26 UTC - Grant Wu: Additionally, can you listen on a regex of 
topics?
----
2018-12-12 21:58:32 UTC - Grant Wu: like you can with multi-topic subscriptions?
----
2018-12-12 21:58:56 UTC - Sanjeev Kulkarni: yes you can
----
2018-12-12 22:00:13 UTC - Grant Wu: So we’d just use the standard multi-topic 
subscription format here: 
<https://pulsar.apache.org/docs/en/functions-deploying/#local-run-mode> ?
----
2018-12-12 22:00:21 UTC - Matteo Merli: oh, I meant to quote the other message… 
:slightly_smiling_face:

&gt; Is it possible to have a Pulsar function publish to an arbitrary number of 
output topics?
----
2018-12-12 22:00:21 UTC - Grant Wu: as an argument to `--inputs`
----
2018-12-12 22:01:59 UTC - Grant Wu: so something like
```
$ bin/pulsar-admin functions create \
  --py myfunc.py \
  --classname myfunc.SomeFunction \
  --inputs "<persistent://public/default/.*>" \
  --output <persistent://public/default/output-1>
```
----
2018-12-12 22:03:22 UTC - Grant Wu: Hrm.  But is it possible to publish to an 
arbitrary number of outputs?  It seems like I can use `context.publish` with 
the following caveats - I need to specify the topic name explicitly, and I 
can’t create new topics
----
2018-12-12 22:08:36 UTC - Grant Wu: (Just added quotes to avoid shell globbing)
----
2018-12-12 22:21:29 UTC - Matteo Merli: that is correct. Just is not automated 
in the same way of “returning a value from the function”
----
2018-12-12 22:29:18 UTC - Sanjeev Kulkarni: --inputs is a comma seperated list 
of topic names(and cannot contain regex). For regex use --topics-pattern
----
2018-12-12 22:30:34 UTC - Grant Wu: ah, okay
----
2018-12-12 22:33:22 UTC - Jonathan Budd: @Jonathan Budd has joined the channel
----
2018-12-12 22:37:51 UTC - Grant Wu: It would be nice to get this better 
documented :sweat_smile: it seems like the only way to discover this is by 
looking at the command line `--help`
----
2018-12-12 22:58:37 UTC - Grant Wu: Oh, I think I have hit a bug in the Pulsar 
Functions Python SDK
----
2018-12-12 22:58:44 UTC - Grant Wu: ```
Traceback (most recent call last):
  File 
"/Users/grant.wu/fakebin/apache-pulsar-2.2.0/instances/python-instance/python_instance.py",
 line 267, in actual_execution
    output_object = self.function_class.process(input_object, self.contextimpl)
  File "/Users/grant.wu/petuum/pulsar_functions_poc/test.py", line 8, in process
    context.publish("<persistent://public/default/muhoutput>", input + '!')
  File 
"/Users/grant.wu/fakebin/apache-pulsar-2.2.0/instances/python-instance/contextimpl.py",
 line 135, in publish
    compression_type=compression_type
  File 
"/Users/grant.wu/fakebin/apache-pulsar-2.2.0/instances/python-instance/pulsar/__init__.py",
 line 378, in create_producer
    _check_type(CompressionType, compression_type, 'compression_type')
  File 
"/Users/grant.wu/fakebin/apache-pulsar-2.2.0/instances/python-instance/pulsar/__init__.py",
 line 925, in _check_type
    raise ValueError("Argument %s is expected to be of type '%s'" % (name, 
var_type.__name__))
ValueError: Argument compression_type is expected to be of type 
'CompressionType'
```
----
2018-12-12 22:58:56 UTC - Grant Wu: It appears that it does not like the 
default `None` argument
----
2018-12-12 23:02:04 UTC - Grant Wu: Oh, this bug is fixed in master.
----
2018-12-12 23:02:18 UTC - Grant Wu: It was 
<https://github.com/apache/pulsar/pull/2844>.
----
2018-12-12 23:03:44 UTC - Grant Wu: :disappointed: looks like this bugfix 
missed the release window by 10 days
----
2018-12-12 23:04:06 UTC - Grant Wu: @Matteo Merli thoughts?
----
2018-12-12 23:04:12 UTC - Grant Wu: Is there a RC I can download?
----
2018-12-12 23:04:36 UTC - Matteo Merli: we’re preparing a 2.2.1 release
----
2018-12-12 23:05:47 UTC - Grant Wu: So just sit tight, then? :stuck_out_tongue:
----
2018-12-12 23:06:17 UTC - Matteo Merli: that should be soon 
:slightly_smiling_face:
----
2018-12-12 23:06:20 UTC - Grant Wu: okay :slightly_smiling_face:
----
2018-12-13 03:38:37 UTC - wangwei: @wangwei has joined the channel
----
2018-12-13 05:58:49 UTC - 廖师虎: @廖师虎 has joined the channel
----

Slack digest for #general - 2018-12-13

Reply via email to