Slack digest for #general - 2020-05-29

Apache Pulsar Slack Fri, 29 May 2020 02:11:14 -0700

2020-05-28 12:03:11 UTC - Sébastien de Melo: Thanks, we can because we are not 
using functions, we'll try that
----
2020-05-28 12:19:57 UTC - Laurent Chriqui: How about a procedure to recover a 
failed bookie ? in case something else happens...
----
2020-05-28 12:22:20 UTC - Sébastien de Melo: @Sijie Guo
+1 : Laurent Chriqui
----
2020-05-28 17:24:08 UTC - Sijie Guo: @Laurent Chriqui - What does “a failed 
bookie” mean here?

If it is a bookie that is gone, bookie autorecovery will automatically
re-replicate data to ensure ledgers meet the persistence requirements. You can
also run `bin/bookkeeper shell recover` or `bin/bookkeeper shell decommission`
to do so.
----
2020-05-28 18:15:41 UTC - Alexander Ursu: Hi, I'm trying out the websocket
interface, and noticed that the example code on the official docs is Python 2
only. Is there an example with Python 3 anywhere? I tried to convert it over
but am having issues with encoding of bytes in json objects which I think has
changed since Python 2
----
2020-05-28 18:33:20 UTC - Curtis Cook: i think you just need to convert to a
string (if you actually have a dict) and encode in utf-8 ?
----
2020-05-28 18:33:36 UTC - Curtis Cook: the non-websocket code does this behind
the scenes
----
2020-05-28 18:39:55 UTC - Alexander Ursu: I tried encoding the strings to
utf-8, before they go through base64 encoding, but Python complains that a
value in a JSON object cannot be of type bytes
----
2020-05-28 18:57:29 UTC - Sijie Guo: There are a couple of workarounds

1. In the bookkeeper k8s service definition, you can use
`publishNotReadyAddresses: true`. Since bookkeeper is a statefulset service. So
it will always try to read the data from the pods that store it. It is okay to
publish not-ready address. This will solve the NPE issue you saw.
2. In a related issue, brokers will take ~15 minutes to detect TCP connection
disconnected after bookkeeper pods restarted. Because a) bookkeeper uses
hostname (pod name in k8s context) to maintain connection pool; b) the default
TCP keepalive settings is high so it usually takes more than 15 minutes to
detect tcp connection disconnected. So in this case, try to tune the tcp
keepalive settings in your broker pods.
----
2020-05-28 18:58:42 UTC - Sijie Guo: Do you use sendAsync or send to produce
messages?
----
2020-05-28 19:25:10 UTC - Sébastien de Melo: Hi @Sijie Guo, Laurent meant in
case of this kind of problems. We have an autorecovery pod always running and
also tried the recover subcommand, without success. It completed correctly but
the problem was still present. We did not try the decommission one though.
Is it a bug or have we done something wrong?
In addition, is it possible to launch an empty bookie and force a replication?
If so, how? Is storage expansion needed for that to work?
----
2020-05-28 20:19:13 UTC - Adelina Brask: So, I can't seem to move on with
pulsar sources and sinks. I have a 3 node cluster with powerful machines, and I
am using Logstash to send json to Netty source - Elastic sink to Elastic. I am
keep getting `"exceptionString" : "Elasticsearch exception
[type=mapper_parsing_exception, reason=failed to parse field [the_real_deal] of
type [text] in document with id 'R-rqXHIBp3FZZNCr9L8R'. Preview of field's
value: '']",` for all felts in random order. But none of the felts are even
empty. I have check by sending the same data from Logstash to Kafka to Elastic.
I have also checked by packing the hole event into 1 string felt
"the_real_deal". I don't understand why Pulsar is not parsing the text? The
text as found in the topic is:
`{"the_real_deal":"{\"@version\":\"1\",\"host\":\"10.220.37.11\",\"timestamp\":\"2020-05-28T19:44:33,182Z\",\"node.id\":\"IA4phyHvRTaTh_dNrdX1QA\",\"message\":\"org.elasticsearch.xpack.security.audit
action=\\\"indices:data/write/index:op_type/index\\\"
event.action=\\\"access_granted\\\" event.type=\\\"transport\\\"
indices=\\\"[\\\"default_sheplog_systems_elasticsearch-weekly.2020.w22\\\"]\\\"
node.id=\\\"IA4phyHvRTaTh_dNrdX1QA\\\"
origin.address=\\\"10.220.37.51:52538\\\" origin.type=\\\"rest\\\"
request.id=\\\"ULF15RZhSwOD11Ie_GaIuA\\\" request.name=\\\"BulkItemRequest\\\"
user.name=\\\"logstash_internal\\\" user.realm=\\\"native1\\\"
user.roles=\\\"[\\\"logstash_internal\\\"]\\\"\",\"original_size\":949,\"type\":\"server\",\"level\":\"INFO\",\"cluster.uuid\":\"664ugVCVRzmPzVeJ3kcPLg\",\"cluster.name\":\"Elasticsearch\",\"node.name\":\"clstelastic01.dmz23.local\",\"date\":1.590695073201703E9,\"tags\":[\"default_sheplog_systems_elasticsearch\"],\"component\":\"o.e.x.s.a.l.LoggingAuditTrail\",\"uuid\":\"4b6a6e01-d27f-4572-b30c-c8c59e9813f8\",\"@timestamp\":\"2020-05-28T19:44:35.578Z\"}","tags":["default_sheplog_systems_elasticsearch"]}`
----
2020-05-28 21:01:49 UTC - Alexander Ursu: Hi, I'm using the Pulsar SQL plugin
for Presto, but I'm noticing that sometimes queries don't receive the latest
messages from a topic.

The topic in question has a very slow rate of ingestion, and we intend to keep
infinite retention especially with tiered storage.

I notice that when the topic is queried for all messages, the latest don't
appear, but they are actually available for consumption in other means.

Once newer messages come in, the ones from before finally become available. Not
sure if this is an issue with the plugin, or Pulsar in general. This is on
version 2.5.1
----
2020-05-28 21:06:49 UTC - Alexander Ursu: Could this is any way be related to
`pulsar.max-entry-read-batch-size` in the `pulsar.properties` config file for
the catalog?
----
2020-05-29 00:15:16 UTC - Curtis Cook: so the payload is a base64 encoded string
----
2020-05-29 00:15:28 UTC - Curtis Cook: assuming you have some dict foo
----
2020-05-29 00:18:36 UTC - Curtis Cook: ```foo = {'key': 'value'}
msg = {
"payload": base64.b64encode( json.dumps(foo).encode('utf-8') )
}```
----
2020-05-29 06:17:32 UTC - Lawal Azeez: @Lawal Azeez has joined the channel
----
2020-05-29 06:42:12 UTC - Lawal Azeez: Hello house.
I currently use Nodejs client in my project. but my surprise is that anytime i
have more than 3 subscriptions, pulsar producer will stop working. the code
snippet below will never execute again for all producers.
```const producer = await this.client.createProducer({
topic: 'my-topic'
});```
it doesn't matter wether I subscribe to the same topic or not. everything work
fine until the subscription exceed 3. Tried the same process with C# client and
everything seem fine. so i am confuse what the issue might be.
#please help
----
2020-05-29 07:11:53 UTC - Sijie Guo: I don’t think Pulsar connector will parse
the text. I think it will just pass the text received in netty source to
elastic sink. Have you tried to inspect the data in the output topic of netty
source?
----
2020-05-29 07:13:30 UTC - Sijie Guo: @Alexander Ursu this is related to Pulsar
SQL bypassing brokers to read the data from bookkeeper directly. The
LastAddConfirmed for an active ledger is only advanced with next entry or next
explicit lac commit. I think there is an open issue for this that we need to
enable a flag in pulsar broker and fixes in presto connector.
----
2020-05-29 07:13:30 UTC - Adelina Brask: Yes. And the data seems complete. The
felt is never empty. I can't understand why
----
2020-05-29 07:14:04 UTC - Sijie Guo: Can you please show your code snippest?
----
2020-05-29 07:14:33 UTC - Sijie Guo: Did you have the original data without
packing event into 1 string?
----
2020-05-29 07:27:54 UTC - Lawal Azeez: @Sijie Guo thanks for trying to help
----
2020-05-29 08:26:50 UTC - Goose: @Goose has joined the channel
----
2020-05-29 08:28:33 UTC - Adelina Brask:
`{"date":1590737041.195320,"type":"server","timestamp":"2020-05-29T07:24:01,003Z","level":"INFO","component":"o.e.x.s.a.l.LoggingAuditTrail","cluster.name":"Elasticsearch","node.name":"clstelastic01.dmz23.local","message":"org.elasticsearch.xpack.security.audit
action=\"indices:data/write/index:op_type/index\"
event.action=\"access_granted\" event.type=\"transport\"
indices=\"[\"default_sheplog_systems_elasticsearch-weekly.2020.w22\"]\"
node.id=\"IA4phyHvRTaTh_dNrdX1QA\" origin.address=\"10.220.37.51:44588\"
origin.type=\"rest\" request.id=\"pE_t9rboQv2iw_KhnFRcrg\"
request.name=\"BulkItemRequest\" user.name=\"logstash_internal\"
user.realm=\"native1\"
user.roles=\"[\"logstash_internal\"]\"","cluster.uuid":"664ugVCVRzmPzVeJ3kcPLg","node.id":"IA4phyHvRTaTh_dNrdX1QA"}`
----
2020-05-29 08:31:38 UTC - Adelina Brask: the error is then a random parameter
(tag, @timestamp, node.id) `Preview of field's value: ''] .` Those paramters
are never empty, as they are stamped in our Logstash for each event.
----
2020-05-29 09:08:31 UTC - lujop: @lujop has joined the channel
----

Slack digest for #general - 2020-05-29

Reply via email to