Slack digest for #general - 2020-09-30

Apache Pulsar Slack Wed, 30 Sep 2020 02:11:40 -0700

2020-09-29 09:13:06 UTC - Madhavan Narayanan: Hi,
I have a related question.
Does Key_Shared subscription work with partitioned topics?
----
2020-09-29 14:30:58 UTC - Alexander Brown: Based on the previous release 
schedule it seems like a new version comes out approximately every two months. 
Looking like around 10/20/2020 as of now.
----
2020-09-29 14:31:27 UTC - Linton: cool, I can wait
----
2020-09-29 14:55:14 UTC - Milos Matijasevic: @Milos Matijasevic has joined the 
channel
----
2020-09-29 15:32:14 UTC - yannick: is it a know bug that the official helm 
chart cant offload to s3?  we keep on getting errors that it wants to try to 
use `us-east-1`  while we set s3ManagedLedgerOffloadRegion AND even 
`AWS_DEFAULT_REGION` env ?
----
2020-09-29 15:32:47 UTC - yannick: `The authorization header is malformed; the 
region 'us-east-1' is wrong; expecting 'eu-central-1'`
----
2020-09-29 15:34:17 UTC - Axel Sirota: then i dont think you can use schema, 
when you submit a schema to a topic it means that topic will get exactly that 
"format" for every message.
----
2020-09-29 15:42:10 UTC - Addison Higham: what region is your bucket created in?
----
2020-09-29 15:52:18 UTC - Addison Higham: yes it does, but your partition also 
plays a role, so for example, if you scale up your number of partitions, 
key_shared doesn't not guarantee ordering during the transition to new number 
of partitions.


Also, if you enable batching, you need to make sure you use the correct 
`BatcherBuilder`, which you can see an example of in this blog post: 
<https://medium.com/@ankushkhanna1988/apache-pulsar-key-shared-mode-sticky-consistent-hashing-a4ee7133930a>
----
2020-09-29 15:54:30 UTC - Madhavan Narayanan: Thanks for the response @Addison 
Higham
----
2020-09-29 15:55:13 UTC - Shawn: @Sijie Guo An update: This behavior (storage 
not clearing) started after the use of NACKs. We just removed all NACKs and the 
storage GC appears to be working now. It seems when we had a handful of nacks 
around a GC window, even after all are redelivered and eventually acked the 
markDeletePosition will never move forward until the backlog is cleared (or 
topic unload). Are there any known issues with NACKs?
----
2020-09-29 15:57:43 UTC - Milos Matijasevic: eu-central-1
----
2020-09-29 16:08:17 UTC - Addison Higham: oh I see, I misinterpreted before. I 
am not aware of an issue with the helm charts around that... What I would 
suggest you do is is exec into the pod and look at the `conf/broker.conf` file 
and ensure that the s3ManagedLedgerOffloadRegion is being set as expected. If 
not, we can dig in and see why that wouldn't be getting set. If it is set, it 
may be that somewhere else a region is getting set and that is getting preferred
----
2020-09-29 16:22:07 UTC - Milos Matijasevic: in broker.conf 
s3ManagedLedgerOffloadRegion is set to correct value
----
2020-09-29 16:23:21 UTC - Addison Higham: hrm... if you dump out your env for 
the pod, do any have the wrong region?
----
2020-09-29 16:23:51 UTC - Milos Matijasevic: no, envs are correct too, 
eu-east-1 is default region, so somehow it doesn't pick up region from variable
----
2020-09-29 16:25:14 UTC - Addison Higham: what version of pulsar are you using?
----
2020-09-29 16:26:19 UTC - Milos Matijasevic: pulsar from latest pulsar helm 
chart version, 
<https://github.com/apache/pulsar-helm-chart/tree/master/charts/pulsar>
----
2020-09-29 16:28:01 UTC - Addison Higham: how are you injecting credentials?
----
2020-09-29 16:30:34 UTC - Milos Matijasevic: from ec2 matadata, kubernetes 
cluster is also running on aws
----
2020-09-29 16:33:40 UTC - Addison Higham: are you setting 
`s3ManagedLedgerOffloadServiceEndpoint` ?
----
2020-09-29 16:35:19 UTC - Milos Matijasevic: we tried first without it, then we 
tried to set it, in both cases same error
----
2020-09-29 16:38:38 UTC - Addison Higham: okay, just for sanity, you should see 
this log line: "Constructor offload driver" on broker startup that will contain 
the configured endpoint and region, can you share that line?
----
2020-09-29 16:39:43 UTC - Milos Matijasevic: sure i looked it before just sec
----
2020-09-29 16:41:33 UTC - Addison Higham: but just to add some context, nowhere 
in pulsar do we default to us-east-1 or the s3 endpoint, it is possible the 
underlying library does... and it should just take that value directly and use 
it.

However, there is also per namespace offload settings now in Pulsar 2.6.x (see 
`pulsar-admin namespaces set-offload-policies`)
----
2020-09-29 16:42:34 UTC - Milos Matijasevic: `Constructor offload driver: 
aws-s3, host: <https://s3-accesspoint.eu-central-1.amazonaws.com>, container: 
cwire-pulsar-dev, region: eu-central-1` this is latest setting with endpoint 
variable set
----
2020-09-29 16:51:03 UTC - Milos Matijasevic: aha i see, 
<https://pulsar.apache.org/docs/en/cookbooks-tiered-storage/> i read it here 
that us-east-1 is default

Ok maybe 2.6.x version of pulsar with helm is problem, i am trying to do same 
thing locally in minikube with pulsar 2.5.0 and it works
----
2020-09-29 16:51:04 UTC - Addison Higham: I think you are setting the wrong 
endpoint, "S3 Access Point" is a distinct s3 service that allows you to create 
your own named s3 endpoints, but it doesn't handle any real s3 commands, 
instead, you should be setting it to 
`<http://s3.eu-central-1.amazonaws.com|s3.eu-central-1.amazonaws.com>`
----
2020-09-29 16:52:49 UTC - Milos Matijasevic: ok i will try that endpoint now
----
2020-09-29 16:53:48 UTC - Sijie Guo: @Shawn I am not aware of any specific 
issues with NACKs.

When you said “we removed all NACKs”, do you mean that you removed all NACKs 
and relied on using ack timeout?
----
2020-09-29 16:56:35 UTC - Sijie Guo: In the current Pulsar protocol, acks are 
done in a best-effort manner. Acknowledgements are sent to brokers in a 
fire-and-forget manner. The ACKs can be lost when the clients send 
acknowledgements back. So we usually recommend people to enable Ack timeout. 
This would ensure the messages will eventually be delivered and acked again.

Enabling ACK timeout is usually a good approach to resolve the backlog issues 
people saw.
----
2020-09-29 16:56:54 UTC - Addison Higham: hrm... that doc is a bit misleading, 
but I am also wrong too :stuck_out_tongue: we do default to 
`<http://s3.amazonaws.com|s3.amazonaws.com>`, but s3 is a bit weird in that it 
*should* redirect to the correct region if your bucket is in another region. S3 
is sort of semi-global that way.

However, now with v4 signing algorithm I think that may be why you aren't 
getting redirected and instead get the auth error (assuming you are hitting 
`<http://s3.amazonaws.com|s3.amazonaws.com>`).

Let's hope changing the endpoint fixes it... if not, then I will have to dig in 
a bit deeper and see how you can be setting the correct settings (as witnessed 
by the log you got) but still be getting the wrong endpoint
----
2020-09-29 16:57:21 UTC - yannick: @Addison Higham thanks for your help!
just for sanity check, we're setting `s3ManagedLedgerOffloadServiceEndpoint`  
does that go into `aws-s3` driver?
----
2020-09-29 16:58:18 UTC - yannick: i was also under the assumption that this is 
the problem, however i would have expected the error to change once we started 
to set `s3ManagedLedgerOffloadServiceEndpoint` which didnt happen
----
2020-09-29 16:59:05 UTC - yannick: additionally usually default aws driver 
works, especially when you set `AWS_DEFAULT_REGION` (which we did). so 
something seems to be borked
----
2020-09-29 17:00:27 UTC - Addison Higham: yes, the main settings you can 
control are `s3ManagedLedgerOffloadRegion`, `s3ManagedLedgerOffloadBucket`, and 
`s3ManagedLedgerOffloadServiceEndpoint`

And just as a bit more context, pulsar uses `jclouds` which is a cloud agnostic 
wrapper over multiple cloud vendors blob stores, so *some* of the typical AWS 
SDK things (like `AWS_DEFAULT_REGION`) may not quite work the same way
----
2020-09-29 17:01:59 UTC - yannick: :confused:
----
2020-09-29 17:02:35 UTC - Addison Higham: and as far as the error not changing, 
I think you may have chosen the wrong endpoint (with 
`<https://s3-accesspoint.eu-central-1.amazonaws.com>` being for the 
s3-accesspoint service, which perhaps cause some weirdness)
----
2020-09-29 17:02:37 UTC - yannick: so could it be that jclouds cant work 
without credentials from aws metadata?
----
2020-09-29 17:03:36 UTC - yannick: `Caused by: 
org.jclouds.aws.AWSResponseException: request POST 
<https://cwire-pulsar-dev.s3.amazonaws.com/>`
----
2020-09-29 17:03:53 UTC - Addison Higham: it does work with credentials 
service, it actually does import the aws credential provider chains to generate 
credentials
----
2020-09-29 17:03:58 UTC - yannick: all while: 
`s3ManagedLedgerOffloadServiceEndpoint=<https://s3.eu-central-1.amazonaws.com>`
----
2020-09-29 17:04:45 UTC - Addison Higham: try getting rid of `https://`, it may 
be just the hostname
----
2020-09-29 17:05:02 UTC - Addison Higham: (and since it is failing validation, 
it is falling back to default)
----
2020-09-29 17:05:10 UTC - yannick: aha
----
2020-09-29 17:05:24 UTC - Milos Matijasevic: aha it failed with https again now 
will try without https
----
2020-09-29 17:05:31 UTC - yannick: @Milos Matijasevic  can you redeploy ?
----
2020-09-29 17:05:51 UTC - yannick: only with 
`s3ManagedLedgerOffloadServiceEndpoint=<http://s3.eu-central-1.amazonaws.com|s3.eu-central-1.amazonaws.com>`
----
2020-09-29 17:06:54 UTC - yannick: @Addison Higham are the docs on github so i 
can do a PR once we fix this, so that others are lucky a bit faster ?
----
2020-09-29 17:08:17 UTC - Addison Higham: that would be wonderful, docs are 
here: <https://github.com/apache/pulsar/tree/master/site2/docs> and specific 
doc to change is here: 
<https://github.com/apache/pulsar/blob/master/site2/docs/cookbooks-tiered-storage.md>
----
2020-09-29 17:08:29 UTC - Addison Higham: also, I am checking the jcloud docs 
to make sure I am not incorrect on that endpoint setting
----
2020-09-29 17:08:38 UTC - Shawn: Yes, we have always had ackTimeouts, however 
we recently added NACKs for processing errors so we can mark for redelivery 
faster, before they would eventually timeout. After removing the NACKs the disk 
storage no longer fails to cleanup and growing to 20+GB and eventually causing 
publishes to fail.
----
2020-09-29 17:09:19 UTC - yannick: however, i see no reason why we would need 
to set an endpoint
----
2020-09-29 17:10:02 UTC - yannick: so its either a jclouds bug or pulsar using 
it in a weird way. i can try to drill it down once we find a working version.
----
2020-09-29 17:10:12 UTC - yannick: if its just validation it should imo fail 
and not silently fallback
----
2020-09-29 17:10:52 UTC - yannick: from the jclouds docs:
```public static final String GOV_CLOUD_ENDPOINT = 
"<https://ec2.us-gov-west-1.amazonaws.com>";
overrides.setProperty("aws-ec2.endpoint", GOV_CLOUD_ENDPOINT);```
----
2020-09-29 17:11:16 UTC - yannick: this is ec2 however, but id expect https:// 
being alright
----
2020-09-29 17:13:23 UTC - Addison Higham: I believe you are correct about 
`https` and you are correct that you shouldn't need to set an endpoint. Why I 
am confused is that we have this working for other customers and what not that 
also run in different regions
----
2020-09-29 17:14:14 UTC - Shawn: Also, we never removed the ackTimeouts. What 
is confusing to me is messages seem to never be redelivered/acked if the 
original NACK/ACK cycle was lost, and letting the storage be cleared.
----
2020-09-29 17:14:26 UTC - yannick: another thing that @Milos Matijasevic 
mentioned is that pods or processes are not being restarted if config changes 
are deployed via helm
----
2020-09-29 17:14:31 UTC - Addison Higham: hrm, perhaps this is it: are you 
allowing the IAM policy you are running as to do `GetBucketLocation`?
----
2020-09-29 17:14:51 UTC - yannick: good call, let me check
----
2020-09-29 17:14:56 UTC - yannick: but iirc yes
----
2020-09-29 17:15:16 UTC - Addison Higham: :thinking_face: they should restart, 
but it is a stateful set instead of a deployment, so the restarts take longer
----
2020-09-29 17:15:31 UTC - yannick: any of those customers in frankfurt or any 
of those other regions that FORCE sigv4?
----
2020-09-29 17:16:33 UTC - Julio Monroy: @Julio Monroy has joined the channel
----
2020-09-29 17:16:53 UTC - yannick: ```dev-pulsar        dev-pulsar      11      
        2020-09-29 17:08:20.567845623 +0000 UTC deployed        pulsar-2.6.1-2  
2.6.1```
----
2020-09-29 17:17:26 UTC - yannick: `GetBucketLocation` missing :open_mouth:
----
2020-09-29 17:18:02 UTC - yannick: added. lets see
----
2020-09-29 17:18:20 UTC - Addison Higham: I am actually not sure about sigv4, 
if adding `GetBucketLocation` and removing the endpoint doesn't fix it, then it 
is likely a sigv4 issue
----
2020-09-29 17:18:55 UTC - yannick: `GetBucketLocation` would be tough to 
swallow, and strange since our other apps work fine (all golang though)
----
2020-09-29 17:21:44 UTC - yannick: ```root@dev-pulsar-broker-1:/pulsar/bin# 
./pulsar-admin topics offload-status -w public/default/test-topic-1
Reason: java.util.concurrent.CompletionException: 
org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: 
Could not complete the operation. Number of retries has been exhausted. Failed 
reason: Connection refused: localhost/127.0.0.1:8080```
----
2020-09-29 17:21:48 UTC - yannick: it gets weirder and weirder
----
2020-09-29 17:22:49 UTC - Addison Higham: hrm :confused: do you see anything in 
broker logs?
----
2020-09-29 17:23:39 UTC - Addison Higham: I need to actually jump on a meeting, 
but if you want to file this as a github issue, I can also get someone 
investigating if this is an issue with v4. I see that jclouds does support v4 
only regions, but it is possibly we might need to change something
----
2020-09-29 17:25:47 UTC - Milos Matijasevic: thank you for your help!
actually there was memory issue when endpoint doesn't contains https://
we will test now latest settings
----
2020-09-29 17:25:57 UTC - yannick: i'll dig it down. when milos removed `https` 
prefis the pods got OOM killed due to heap issues. i love java
----
2020-09-29 17:26:08 UTC - yannick: thanks a lot for your support addison
----
2020-09-29 17:26:33 UTC - Addison Higham: apologies for the issues! and the 
https thing is very strange...
----
2020-09-29 17:28:09 UTC - yannick: ohai, it picked up the endpoint finally !
----
2020-09-29 17:28:11 UTC - yannick: ```dev-pulsar-broker-1 dev-pulsar-broker 
Caused by: org.jclouds.aws.AWSResponseException: request POST 
<https://pulsar-dev.s3-eu-central-1.amazonaws.com/4ea1d3ce-aa87-423e-b8a9-82ebf29cab07-ledger-212?uploads>
 HTTP/1.1 failed with code 403, error: AWSError{requestId='61A203E2E736DB14', 
requestToken='+=', code='', message='Access Denied', 
context='{HostId=+c6ncUY72V1aa2RUzkS51U8HonKEqBBI=}'}```
----
2020-09-29 17:31:19 UTC - Addison Higham: ah I wonder if some of this might be 
the way statefulsets role and you picking up old configmaps
----
2020-09-29 17:31:45 UTC - yannick: ok we made it work
----
2020-09-29 17:32:54 UTC - yannick: partially my bad as i operated under the 
assumption the pod has access to the bucket (which it hadnt) mixed with  
missleading error logs
----
2020-09-29 17:41:17 UTC - Evan Furman: Wanted to follow back up here. To 
simplify things, I was planning to disable user.management; however, it looks 
like the default credentials don’t have superuser permissions that allow the 
user to add a cluster. Is this intended?
----
2020-09-29 17:41:32 UTC - Evan Furman: @Addison Higham
----
2020-09-29 17:51:12 UTC - Addison Higham: I am not sure on this one, @Sijie Guo 
may have a better idea...
+1 : Evan Furman
----
2020-09-29 18:09:15 UTC - Evan Furman: 
----
2020-09-29 19:06:24 UTC - Lekan Adigun: @Lekan Adigun has joined the channel
----
2020-09-29 21:35:13 UTC - vikash: Hi @Sijie Guo  ,i  am  facing issues  in  
creating  apache  pulsar  fucntion  using  rest  api post
----
2020-09-29 21:35:28 UTC - vikash: &lt;html&gt;
    &lt;head&gt;
        &lt;meta http-equiv="Content-Type" 
content="text/html;charset=utf-8"/&gt;
        &lt;title&gt;Error 415 Unsupported Media Type&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;h2&gt;HTTP ERROR 415 Unsupported Media Type&lt;/h2&gt;
        &lt;table&gt;
            &lt;tr&gt;
                &lt;th&gt;URI:&lt;/th&gt;
                
&lt;td&gt;/admin/v2/functions/9e7b08b5-3bfd-4026-b322-c4a34fffafd7/PulsarFunction/ProrationFactorCalculation&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr&gt;
                &lt;th&gt;STATUS:&lt;/th&gt;
                &lt;td&gt;415&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr&gt;
                &lt;th&gt;MESSAGE:&lt;/th&gt;
                &lt;td&gt;Unsupported Media Type&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr&gt;
                &lt;th&gt;SERVLET:&lt;/th&gt;
                
&lt;td&gt;org.glassfish.jersey.servlet.ServletContainer-53202b06&lt;/td&gt;
            &lt;/tr&gt;
        &lt;/table&gt;
        &lt;hr&gt;
        &lt;a href="<http://eclipse.org/jetty>"&gt;Powered by Jetty:// 
9.4.29.v20200521&lt;/a&gt;
        &lt;hr/&gt;
    &lt;/body&gt;
&lt;/html&gt;
----
2020-09-29 21:36:34 UTC - vikash: i am  passing  containType  as  
application/json
----
2020-09-29 21:39:27 UTC - Sijie Guo: &gt; however, it looks like the default 
credentials don’t have superuser permissions that allow the user to add a 
cluster. Is this intended?
Are you using token authentication in your Pulsar cluster?

Did you config the super-user token in pulsar-manager side?
----
2020-09-29 21:41:58 UTC - Evan Furman: the only change I made from the default 
`application.properties` was to set `user.management.enable` to `false`
----
2020-09-29 21:42:21 UTC - Sijie Guo: Interesting. What is the ack timeout? What 
language of the client are you using? And what version is that client?
----
2020-09-29 21:43:18 UTC - Sijie Guo: Is your Pulsar cluster protected with 
authentication/authorization? If so, what authentication mechanism do you use?
----
2020-09-29 21:43:37 UTC - Evan Furman: Nope, we have not enabled it
----
2020-09-29 21:44:40 UTC - Evan Furman: We may decide to do so downline but 
right now we are trying to keep things simple as we learn the system.
----
2020-09-29 21:45:31 UTC - Evan Furman: In pulsar manager v1 the default 
credentials (pulsar/pulsar) were sufficient to perform all operations
----
2020-09-29 21:51:21 UTC - Sijie Guo: okay. make sense.

So you are able to login using pulsar/pulsar. But you are not able to create an 
environment. is that correct?
----
2020-09-29 21:52:14 UTC - Evan Furman: Yep, exactly. I trying setting the 
default environment too but pulsar/pulsar doesn’t have access to see it
----
2020-09-29 21:53:03 UTC - Evan Furman: I was just surprised to see that the 
defaults in v2 behave differently than in v1
----
2020-09-29 21:56:24 UTC - Sijie Guo: It shouldn’t. v2 only introduced user 
authentication. Let me loop @tuteng in to help you on this.
----
2020-09-29 21:56:59 UTC - Evan Furman: ok perfect, appreciate the help.
----
2020-09-30 00:52:18 UTC - Yunze Xu: @vikash There's something wrong with REST 
API's doc, creating functions requires multipart/form-data, maybe you've used 
the application/json? You can use pulsar-admin to create/update a function
----
2020-09-30 02:13:45 UTC - dhineshkumar murugesan: @dhineshkumar murugesan has 
joined the channel
----
2020-09-30 03:00:36 UTC - tuteng: we have integrated BookKeeper visual manager 
in the new version 0.2.0 
<https://github.com/apache/pulsar-manager#enable-bookkeeper-visual-manageroptional>.
 You can try to use it.
----
2020-09-30 03:04:27 UTC - tuteng: In the user login section, which image 
version do you use? I recently released apachepulsar/pulsar-manager:0.2.0, and 
apache pulsar/pulsar-manager will use version 0.2.0.
----
2020-09-30 04:09:33 UTC - Luke Stephenson: I had exactly the same experience, 
if you use any region other than the default, any misconfiguration with 
credentials is incorrectly reported as being a region issue 
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1591968316415200?thread_ts=1591924819.402800&amp;cid=C5Z4T36F7>
----
2020-09-30 06:06:57 UTC - charles: Hi all,
On the <https://github.com/apache/pulsar/wiki/Client-Features-Matrix|Pulsar 
client feature matrix>, I noticed that the "*Effectively-Once*" feature is not 
available for the *WebSocket* client. The matrix seems updated till Pulsar 
*2.5.0*.
Is there any chance this will be supported in the future? Or does anyone have 
experience how to make this happen in a different way?
----
2020-09-30 06:33:10 UTC - Walter: @Sijie Guo But when i check the zookeeper 
health by executing "echo "ruok" | nc localhost 2181 ; echo". It's given result 
as "imok"
----
2020-09-30 06:34:10 UTC - Walter: Due to this error broker are failing in 
health check
----

Slack digest for #general - 2020-09-30

Reply via email to