2019-09-25 12:33:10 UTC - Greg Hoover: @Greg Hoover has joined the channel
----
2019-09-25 12:41:27 UTC - Bo Han: @Bo Han has joined the channel
----
2019-09-25 12:50:12 UTC - Bo Han: @Sijie Guo When running pulsar binary in
standalone mode with --bookkeeper-dir changed to another folder, I still see
pulsar creating data folder inside `/data/bookkeeper`. This folder is used by
`data.bookkeeper.ranges`, which can be configured in bookkeeper package
downloaded from apache bookkeeper but not pulsar. Is there a way to fix it?
----
2019-09-25 13:42:13 UTC - Rajiv Abraham: Hi, I’m running debezium on pulsar to
read from a mysql database instance as specified in the excellent tutorial on
debezium. In my Python consumer, on running `msg.value()` or `msg.data()`, I
get a byte string which I can’t decode as UTF-8 or ascii. Can anyone help me
figuring out how to decode it?
```
b' \x00\x00\x00\xa8
{"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"}],"optional":false,"name":"dbserver1.inventory.customers.Key"},"payload":{"id":1001}}
\x00\x00\x08\x00
{"schema":{"type":"struct","fields":[{"type":"struct","fields" >>>
ELIDED FOR CLARITY
>>>>{"version":"0.9.2.Final","connector":"mysql","name":"dbserver1","server_id":0,"ts_sec":0,"gtid":null,"file":"mysql-bin.000003","pos":868,"row":0,"snapshot":true,"thread":null,"db":"inventory","table":"customers","query":null},"op":"c","ts_ms":1569418671780}}'
```
----
2019-09-25 15:15:51 UTC - tuteng: You can try deserializing with the following
code: import pulsar
import json
import struct
client = pulsar.Client('<pulsar://127.0.0.1:6650>')
consumer = client.subscribe('dbserver1.inventory.products',
subscription_name='sub')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
data = msg.data()
b = struct.unpack(">l", data[0:4])
print("Key length: " + str(b[0]))
print("Key data: " + data[4: 4+b[0]])
print("Json loads +++++++++++++")
print(json.loads(data[4: 4 + b[0]]))
c = struct.unpack(">l", data[4 + b[0]: 4 + b[0] + 4])
print("Value length: " + str(c[0]))
value = data[4 + b[0] + 4: 4 + b[0] + 4 + c[0]]
print("Value data: " + value)
print("Json loads +++++++++++++")
print(json.loads(value))
client.close()
----
2019-09-25 15:29:46 UTC - Rajiv Abraham: @tuteng Thank you kindly. Is this
something I have to do for every CDC source? I’m wondering if it’ll be 4 bytes
for every debezium data source? Do you know what that initial string is?
----
2019-09-25 15:34:07 UTC - Christophe Bornet: Hi, it seems Kafka will soon ditch
Zookeeper in favor of a self-managed quorum
(<https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum>).
Is there any plan to do the same in Pulsar ? I understand it would probably be
more work since it would need to be done in both BookKeeper and Pulsar but I
think it would be very nice to have one less cluster to install and manage.
----
2019-09-25 15:47:39 UTC - Matteo Merli: Yes, we do have a plan for that, in
multiple phases. Some work on that front was already started and then paused
some time back for lack of time bandwidth, though we hope to get that effort
back moving soon.
> Kafka will soon ditch Zookeeper
My understanding is that it will take a while to get there
+1 : David Kjerrumgaard, Luke Lu, Christophe Bornet
heart_eyes : Poule, Christophe Bornet
----
2019-09-25 16:40:50 UTC - Ben Hood: Hi, Are there any examples of how to
configure log4cxx with the C API? I’d like to configure the Pulsar client not
to log so much to stdout, but it doesn’t appear to pick up a
`log4cxx.properties` or `log4cxx.conf` from the `cwd`
----
2019-09-25 16:43:38 UTC - Matteo Merli: By default, the log4cxx is disabled at
compile time. Did you compile by passing `-DUSE_LOG4CXX=1` to cmake?
----
2019-09-25 16:46:29 UTC - Ben Hood: No, I used the
`apache-pulsar-client-devel.x86_64` package for Centos
----
2019-09-25 16:46:51 UTC - Ben Hood: Should I try to compile the client myself
instead?
----
2019-09-25 16:48:00 UTC - Matteo Merli: Yes, we have disabled Log4Cxx by
default since it brings in a lot of dependencies
----
2019-09-25 16:49:48 UTC - Matteo Merli: You can use the Docker image
`apachepulsar/pulsar-build:centos-7` which comes with all the dependencies
----
2019-09-25 16:50:07 UTC - Matteo Merli: It’s the same image that’s used build
the Rpms
----
2019-09-25 16:57:06 UTC - Ben Hood: OK, I’ll try that
----
2019-09-25 16:57:42 UTC - Addison Higham: @Rajiv Abraham this is how the
debezium source (currently) encodes the key and the value, that code lives
here:
<https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/common/schema/KeyValue.java#L97>
----
2019-09-25 16:57:45 UTC - Addison Higham: if you want the details
----
2019-09-25 16:58:18 UTC - Ben Hood: BTW is there any other way to trim what the
client logs to stdout other than linking against Log4Cxx?
----
2019-09-25 16:58:32 UTC - Addison Higham: alternate encodings are possible with
pulsar and of that KeyValue class, but currently there isn't a way to change
that out of the box
----
2019-09-25 17:02:25 UTC - Matteo Merli: I was checking since I didn’t
remember.. :slightly_smiling_face:
----
2019-09-25 17:02:26 UTC - Rajiv Abraham: @Addison Higham Thank you so much for
digging that source code. I see that they add four bytes, so it’s debezium
specific and I can use @tuteng code above for debezium. I’m just curious why
they add the bytes. I guess I’ll have to ask them. Thanks again
----
2019-09-25 17:02:55 UTC - Matteo Merli: the simple logger is hardcoded to INFO
level
----
2019-09-25 17:03:11 UTC - Matteo Merli: though it’s possible to pass in a
callback and do your own logging
----
2019-09-25 17:03:46 UTC - Matteo Merli: See `ClientConfiguration::setLogger()`
----
2019-09-25 17:05:03 UTC - Matteo Merli: …or
`pulsar_client_configuration_set_logger()` if you use C
----
2019-09-25 18:07:59 UTC - Addison Higham: the bytes are the length of the key
and the value
----
2019-09-25 18:08:49 UTC - Rajiv Abraham: oh ok, thanks! appreciate it.
----
2019-09-25 18:19:41 UTC - Matteo Merli: @Jesse Zhang (Bose) As an update, I was
able to reproduce locally and get (in some way) the topic in same state.
Working on a fix
----
2019-09-25 19:02:37 UTC - Jesse Zhang (Bose): great!
----
2019-09-25 19:27:43 UTC - Jesse Zhang (Bose): I can also reproduce this using
python client
Producer P produces messsages (60-69) slowly every 2s
consumer(I use python this time) C consume them, but not ack, with ack timeout
setting to 10s
consumer code:
```
consumer = client.subscribe(consumer_name= "jesse-consumer", topic=[
'<persistent://xxxx/xxxx-core/notify.account-removed>'
], subscription_name= 'xxx-xxxx-local', consumer_type=
ConsumerType.Shared, unacked_messages_timeout_ms=10000)
while True:
msg = consumer.receive()
print("Received message id='{}'".format(msg.message_id()))
#consumer.acknowledge(msg)
client.close()
```
this is the log in client:
Received message id=‘(10,60,-1,-1)’
Received message id=‘(10,61,-1,-1)’
Received message id=‘(10,62,-1,-1)’
Received message id=‘(10,63,-1,-1)’
Received message id=‘(10,64,-1,-1)’
Received message id=‘(10,65,-1,-1)’
Received message id=‘(10,66,-1,-1)’
Received message id=‘(10,67,-1,-1)’
Received message id=‘(10,68,-1,-1)’
2019-09-25 15:10:47.037 INFO UnAckedMessageTrackerEnabled:48 | [Muti Topics
Consumer: TopicName -
<persistent://xxxx/xxxx-core/notify.account-removed-TopicsConsumerFakeName-c19975>
- Subscription - svc-xxxx-xxx-sub-local]: 4 Messages were not acked within
10000 time
`//Jesse's comment: from /stats, "unackedMessages": 10`
Received message id=‘(10,60,-1,-1)’
Received message id=‘(10,61,-1,-1)’
Received message id=‘(10,62,-1,-1)’
Received message id=‘(10,63,-1,-1)’
Received message id=‘(10,64,-1,-1)’
Received message id=‘(10,65,-1,-1)’
Received message id=‘(10,66,-1,-1)’
Received message id=‘(10,67,-1,-1)’
Received message id=‘(10,68,-1,-1)’
Received message id=‘(10,69,-1,-1)’
2019-09-25 15:11:07.044 INFO UnAckedMessageTrackerEnabled:48 | [Muti Topics
Consumer: TopicName -
<persistent://xxxxx/xxx-core/notify.account-removed-TopicsConsumerFakeName-c19975>
- Subscription - svc-xxx-xxx-sub-local]: 10 Messages were not acked within
10000 time
`//Jesse's comment: from /stats, "unackedMessages": 1`
Received message id=‘(10,69,-1,-1)’
2019-09-25 15:11:27.052 INFO UnAckedMessageTrackerEnabled:48 | [Muti Topics
Consumer: TopicName -
<persistent://xxxx/xxx-core/notify.account-removed-TopicsConsumerFakeName-c19975>
- Subscription - svc-xxx-xxx-sub-local]: 1 Messages were not acked within
10000 time
`//Jesse's comment: from /stats, "unackedMessages": 0`
```
----
2019-09-25 19:29:21 UTC - Matteo Merli: Yes, I’m using Python client as well
----
2019-09-25 19:30:00 UTC - Matteo Merli: though this only happens under certain
conditions, still narrowing down exactly which ones
----
2019-09-25 19:40:28 UTC - Jesse Zhang (Bose): please guide us on any
workaround. I can upgrade to new client ( we use go client) , but I am not able
to patch our server which is currently on 2.3.1.
----
2019-09-25 20:46:09 UTC - Luke Lu: BTW, we’re getting internal 500 errors doing
`pulsar-admin topics unload` on _some_ topics. Haven’t you guys seen this
before?
----
2019-09-25 21:01:26 UTC - Ben Hood: ah, OK, I hadn’t seen that callback API
----
2019-09-25 21:02:31 UTC - Ben Hood: So that would be an alternative to pulling
in Log4Cxx with all of its dependencies
----
2019-09-25 21:25:50 UTC - Matteo Merli: Yes, you can print and filter the
messages as you like from the custom logger
----
2019-09-25 21:27:49 UTC - Matteo Merli: @Jesse Zhang (Bose) Posted a fix:
<https://github.com/apache/pulsar/pull/5276>
This will get released with 2.4.2, which we expect soon.
The issue itself is on server side, so a client upgrade won’t fix that. As a
general suggestion, you could try to use `negativeAcks` when the processing of
a message fails, instead of relying on the ack timeout, that might work around
the issue as well.
----
2019-09-25 21:37:30 UTC - Matteo Merli: @Luke Lu a 500 error should have a
corresponding exception with stack trace in the broker. Can you get that and
open an issue?
----
2019-09-25 22:53:17 UTC - tuteng: KeyValue schema appears largely because it is
compatible with kafka, and debezium is deeply dependent on kafka. Currently,
python clients do not support KeyValue schema. If KeyValue schema is supported
in the future, it will be more convenient to do this.
----
2019-09-25 22:55:58 UTC - Ben Hood: Turns out that it is quite simple to
control the logging with your own function pointer:
----
2019-09-25 22:56:18 UTC - Ben Hood: ```void log_callback(pulsar_logger_level_t
level, const char *file, int line, const char *message, void *ctx)
{
printf("Log callback %d %s:%d %s\n", (int)level, file, line, message);
}```
----
2019-09-25 22:58:55 UTC - Ben Hood: And then register this with the client
object:
----
2019-09-25 22:59:01 UTC - Ben Hood:
```pulsar_client_configuration_set_logger(conf, &log_callback, NULL);```
----
2019-09-25 23:00:39 UTC - Ben Hood: I’m assuming the `void *ctx` is there to
supply some kind of app specific kind context, if required by the callback
----
2019-09-25 23:02:24 UTC - Matteo Merli: correct, it will be passed back each
time when the callback is invoked, it’s completely app specific
----
2019-09-25 23:09:43 UTC - Ben Hood: Makes complete sense - feels like this API
design allows flexibility in either direction - either with Log4cxx or your own
ditty
----
2019-09-25 23:10:03 UTC - Ben Hood: Many thanks for pointing this out to me,
very much appreciated
----
2019-09-25 23:21:38 UTC - Matteo Merli: :+1:
----
2019-09-25 23:24:30 UTC - Jesse Zhang (Bose): @Matteo Merli thanks for the
update. From release notes, `negativeAcks` is introduced in 2.4.x, is it
actually available in 2.3.1 which we are running?
----
2019-09-25 23:45:18 UTC - Matteo Merli: You should be able to use it with a 2.4
client
----
2019-09-26 00:01:44 UTC - Karthik Ramasamy: Slides of serverless tutorial in
the context of Apache Pulsar and Pulsar functions are available at
----
2019-09-26 00:01:45 UTC - Karthik Ramasamy:
<https://www.slideshare.net/arunkejariwal/serverless-streaming-architectures-and-algorithms-for-the-enterprise-175954094>
----
2019-09-26 00:02:08 UTC - Karthik Ramasamy: This was presented at Strata Data
Conference, New York
heart_eyes : Poule, Ali Ahmed
----
2019-09-26 01:13:38 UTC - zhchxiao: @zhchxiao has joined the channel
----
2019-09-26 08:47:48 UTC - jia zhai: FYI. Slices for today’s talk in Strata NY
is here:
<https://www.slideshare.net/streamnative/how-orange-financial-combat-financial-frauds-over-50m-transactions-a-day-using-apache-pulsar-176284080>
----