2020-09-29 09:13:06 UTC - Madhavan Narayanan: Hi, I have a related question. Does Key_Shared subscription work with partitioned topics? ---- 2020-09-29 14:30:58 UTC - Alexander Brown: Based on the previous release schedule it seems like a new version comes out approximately every two months. Looking like around 10/20/2020 as of now. ---- 2020-09-29 14:31:27 UTC - Linton: cool, I can wait ---- 2020-09-29 14:55:14 UTC - Milos Matijasevic: @Milos Matijasevic has joined the channel ---- 2020-09-29 15:32:14 UTC - yannick: is it a know bug that the official helm chart cant offload to s3? we keep on getting errors that it wants to try to use `us-east-1` while we set s3ManagedLedgerOffloadRegion AND even `AWS_DEFAULT_REGION` env ? ---- 2020-09-29 15:32:47 UTC - yannick: `The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-central-1'` ---- 2020-09-29 15:34:17 UTC - Axel Sirota: then i dont think you can use schema, when you submit a schema to a topic it means that topic will get exactly that "format" for every message. ---- 2020-09-29 15:42:10 UTC - Addison Higham: what region is your bucket created in? ---- 2020-09-29 15:52:18 UTC - Addison Higham: yes it does, but your partition also plays a role, so for example, if you scale up your number of partitions, key_shared doesn't not guarantee ordering during the transition to new number of partitions.
Also, if you enable batching, you need to make sure you use the correct `BatcherBuilder`, which you can see an example of in this blog post: <https://medium.com/@ankushkhanna1988/apache-pulsar-key-shared-mode-sticky-consistent-hashing-a4ee7133930a> ---- 2020-09-29 15:54:30 UTC - Madhavan Narayanan: Thanks for the response @Addison Higham ---- 2020-09-29 15:55:13 UTC - Shawn: @Sijie Guo An update: This behavior (storage not clearing) started after the use of NACKs. We just removed all NACKs and the storage GC appears to be working now. It seems when we had a handful of nacks around a GC window, even after all are redelivered and eventually acked the markDeletePosition will never move forward until the backlog is cleared (or topic unload). Are there any known issues with NACKs? ---- 2020-09-29 15:57:43 UTC - Milos Matijasevic: eu-central-1 ---- 2020-09-29 16:08:17 UTC - Addison Higham: oh I see, I misinterpreted before. I am not aware of an issue with the helm charts around that... What I would suggest you do is is exec into the pod and look at the `conf/broker.conf` file and ensure that the s3ManagedLedgerOffloadRegion is being set as expected. If not, we can dig in and see why that wouldn't be getting set. If it is set, it may be that somewhere else a region is getting set and that is getting preferred ---- 2020-09-29 16:22:07 UTC - Milos Matijasevic: in broker.conf s3ManagedLedgerOffloadRegion is set to correct value ---- 2020-09-29 16:23:21 UTC - Addison Higham: hrm... if you dump out your env for the pod, do any have the wrong region? ---- 2020-09-29 16:23:51 UTC - Milos Matijasevic: no, envs are correct too, eu-east-1 is default region, so somehow it doesn't pick up region from variable ---- 2020-09-29 16:25:14 UTC - Addison Higham: what version of pulsar are you using? ---- 2020-09-29 16:26:19 UTC - Milos Matijasevic: pulsar from latest pulsar helm chart version, <https://github.com/apache/pulsar-helm-chart/tree/master/charts/pulsar> ---- 2020-09-29 16:28:01 UTC - Addison Higham: how are you injecting credentials? ---- 2020-09-29 16:30:34 UTC - Milos Matijasevic: from ec2 matadata, kubernetes cluster is also running on aws ---- 2020-09-29 16:33:40 UTC - Addison Higham: are you setting `s3ManagedLedgerOffloadServiceEndpoint` ? ---- 2020-09-29 16:35:19 UTC - Milos Matijasevic: we tried first without it, then we tried to set it, in both cases same error ---- 2020-09-29 16:38:38 UTC - Addison Higham: okay, just for sanity, you should see this log line: "Constructor offload driver" on broker startup that will contain the configured endpoint and region, can you share that line? ---- 2020-09-29 16:39:43 UTC - Milos Matijasevic: sure i looked it before just sec ---- 2020-09-29 16:41:33 UTC - Addison Higham: but just to add some context, nowhere in pulsar do we default to us-east-1 or the s3 endpoint, it is possible the underlying library does... and it should just take that value directly and use it. However, there is also per namespace offload settings now in Pulsar 2.6.x (see `pulsar-admin namespaces set-offload-policies`) ---- 2020-09-29 16:42:34 UTC - Milos Matijasevic: `Constructor offload driver: aws-s3, host: <https://s3-accesspoint.eu-central-1.amazonaws.com>, container: cwire-pulsar-dev, region: eu-central-1` this is latest setting with endpoint variable set ---- 2020-09-29 16:51:03 UTC - Milos Matijasevic: aha i see, <https://pulsar.apache.org/docs/en/cookbooks-tiered-storage/> i read it here that us-east-1 is default Ok maybe 2.6.x version of pulsar with helm is problem, i am trying to do same thing locally in minikube with pulsar 2.5.0 and it works ---- 2020-09-29 16:51:04 UTC - Addison Higham: I think you are setting the wrong endpoint, "S3 Access Point" is a distinct s3 service that allows you to create your own named s3 endpoints, but it doesn't handle any real s3 commands, instead, you should be setting it to `<http://s3.eu-central-1.amazonaws.com|s3.eu-central-1.amazonaws.com>` ---- 2020-09-29 16:52:49 UTC - Milos Matijasevic: ok i will try that endpoint now ---- 2020-09-29 16:53:48 UTC - Sijie Guo: @Shawn I am not aware of any specific issues with NACKs. When you said “we removed all NACKs”, do you mean that you removed all NACKs and relied on using ack timeout? ---- 2020-09-29 16:56:35 UTC - Sijie Guo: In the current Pulsar protocol, acks are done in a best-effort manner. Acknowledgements are sent to brokers in a fire-and-forget manner. The ACKs can be lost when the clients send acknowledgements back. So we usually recommend people to enable Ack timeout. This would ensure the messages will eventually be delivered and acked again. Enabling ACK timeout is usually a good approach to resolve the backlog issues people saw. ---- 2020-09-29 16:56:54 UTC - Addison Higham: hrm... that doc is a bit misleading, but I am also wrong too :stuck_out_tongue: we do default to `<http://s3.amazonaws.com|s3.amazonaws.com>`, but s3 is a bit weird in that it *should* redirect to the correct region if your bucket is in another region. S3 is sort of semi-global that way. However, now with v4 signing algorithm I think that may be why you aren't getting redirected and instead get the auth error (assuming you are hitting `<http://s3.amazonaws.com|s3.amazonaws.com>`). Let's hope changing the endpoint fixes it... if not, then I will have to dig in a bit deeper and see how you can be setting the correct settings (as witnessed by the log you got) but still be getting the wrong endpoint ---- 2020-09-29 16:57:21 UTC - yannick: @Addison Higham thanks for your help! just for sanity check, we're setting `s3ManagedLedgerOffloadServiceEndpoint` does that go into `aws-s3` driver? ---- 2020-09-29 16:58:18 UTC - yannick: i was also under the assumption that this is the problem, however i would have expected the error to change once we started to set `s3ManagedLedgerOffloadServiceEndpoint` which didnt happen ---- 2020-09-29 16:59:05 UTC - yannick: additionally usually default aws driver works, especially when you set `AWS_DEFAULT_REGION` (which we did). so something seems to be borked ---- 2020-09-29 17:00:27 UTC - Addison Higham: yes, the main settings you can control are `s3ManagedLedgerOffloadRegion`, `s3ManagedLedgerOffloadBucket`, and `s3ManagedLedgerOffloadServiceEndpoint` And just as a bit more context, pulsar uses `jclouds` which is a cloud agnostic wrapper over multiple cloud vendors blob stores, so *some* of the typical AWS SDK things (like `AWS_DEFAULT_REGION`) may not quite work the same way ---- 2020-09-29 17:01:59 UTC - yannick: :confused: ---- 2020-09-29 17:02:35 UTC - Addison Higham: and as far as the error not changing, I think you may have chosen the wrong endpoint (with `<https://s3-accesspoint.eu-central-1.amazonaws.com>` being for the s3-accesspoint service, which perhaps cause some weirdness) ---- 2020-09-29 17:02:37 UTC - yannick: so could it be that jclouds cant work without credentials from aws metadata? ---- 2020-09-29 17:03:36 UTC - yannick: `Caused by: org.jclouds.aws.AWSResponseException: request POST <https://cwire-pulsar-dev.s3.amazonaws.com/>` ---- 2020-09-29 17:03:53 UTC - Addison Higham: it does work with credentials service, it actually does import the aws credential provider chains to generate credentials ---- 2020-09-29 17:03:58 UTC - yannick: all while: `s3ManagedLedgerOffloadServiceEndpoint=<https://s3.eu-central-1.amazonaws.com>` ---- 2020-09-29 17:04:45 UTC - Addison Higham: try getting rid of `https://`, it may be just the hostname ---- 2020-09-29 17:05:02 UTC - Addison Higham: (and since it is failing validation, it is falling back to default) ---- 2020-09-29 17:05:10 UTC - yannick: aha ---- 2020-09-29 17:05:24 UTC - Milos Matijasevic: aha it failed with https again now will try without https ---- 2020-09-29 17:05:31 UTC - yannick: @Milos Matijasevic can you redeploy ? ---- 2020-09-29 17:05:51 UTC - yannick: only with `s3ManagedLedgerOffloadServiceEndpoint=<http://s3.eu-central-1.amazonaws.com|s3.eu-central-1.amazonaws.com>` ---- 2020-09-29 17:06:54 UTC - yannick: @Addison Higham are the docs on github so i can do a PR once we fix this, so that others are lucky a bit faster ? ---- 2020-09-29 17:08:17 UTC - Addison Higham: that would be wonderful, docs are here: <https://github.com/apache/pulsar/tree/master/site2/docs> and specific doc to change is here: <https://github.com/apache/pulsar/blob/master/site2/docs/cookbooks-tiered-storage.md> ---- 2020-09-29 17:08:29 UTC - Addison Higham: also, I am checking the jcloud docs to make sure I am not incorrect on that endpoint setting ---- 2020-09-29 17:08:38 UTC - Shawn: Yes, we have always had ackTimeouts, however we recently added NACKs for processing errors so we can mark for redelivery faster, before they would eventually timeout. After removing the NACKs the disk storage no longer fails to cleanup and growing to 20+GB and eventually causing publishes to fail. ---- 2020-09-29 17:09:19 UTC - yannick: however, i see no reason why we would need to set an endpoint ---- 2020-09-29 17:10:02 UTC - yannick: so its either a jclouds bug or pulsar using it in a weird way. i can try to drill it down once we find a working version. ---- 2020-09-29 17:10:12 UTC - yannick: if its just validation it should imo fail and not silently fallback ---- 2020-09-29 17:10:52 UTC - yannick: from the jclouds docs: ```public static final String GOV_CLOUD_ENDPOINT = "<https://ec2.us-gov-west-1.amazonaws.com>"; overrides.setProperty("aws-ec2.endpoint", GOV_CLOUD_ENDPOINT);``` ---- 2020-09-29 17:11:16 UTC - yannick: this is ec2 however, but id expect https:// being alright ---- 2020-09-29 17:13:23 UTC - Addison Higham: I believe you are correct about `https` and you are correct that you shouldn't need to set an endpoint. Why I am confused is that we have this working for other customers and what not that also run in different regions ---- 2020-09-29 17:14:14 UTC - Shawn: Also, we never removed the ackTimeouts. What is confusing to me is messages seem to never be redelivered/acked if the original NACK/ACK cycle was lost, and letting the storage be cleared. ---- 2020-09-29 17:14:26 UTC - yannick: another thing that @Milos Matijasevic mentioned is that pods or processes are not being restarted if config changes are deployed via helm ---- 2020-09-29 17:14:31 UTC - Addison Higham: hrm, perhaps this is it: are you allowing the IAM policy you are running as to do `GetBucketLocation`? ---- 2020-09-29 17:14:51 UTC - yannick: good call, let me check ---- 2020-09-29 17:14:56 UTC - yannick: but iirc yes ---- 2020-09-29 17:15:16 UTC - Addison Higham: :thinking_face: they should restart, but it is a stateful set instead of a deployment, so the restarts take longer ---- 2020-09-29 17:15:31 UTC - yannick: any of those customers in frankfurt or any of those other regions that FORCE sigv4? ---- 2020-09-29 17:16:33 UTC - Julio Monroy: @Julio Monroy has joined the channel ---- 2020-09-29 17:16:53 UTC - yannick: ```dev-pulsar dev-pulsar 11 2020-09-29 17:08:20.567845623 +0000 UTC deployed pulsar-2.6.1-2 2.6.1``` ---- 2020-09-29 17:17:26 UTC - yannick: `GetBucketLocation` missing :open_mouth: ---- 2020-09-29 17:18:02 UTC - yannick: added. lets see ---- 2020-09-29 17:18:20 UTC - Addison Higham: I am actually not sure about sigv4, if adding `GetBucketLocation` and removing the endpoint doesn't fix it, then it is likely a sigv4 issue ---- 2020-09-29 17:18:55 UTC - yannick: `GetBucketLocation` would be tough to swallow, and strange since our other apps work fine (all golang though) ---- 2020-09-29 17:21:44 UTC - yannick: ```root@dev-pulsar-broker-1:/pulsar/bin# ./pulsar-admin topics offload-status -w public/default/test-topic-1 Reason: java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: Could not complete the operation. Number of retries has been exhausted. Failed reason: Connection refused: localhost/127.0.0.1:8080``` ---- 2020-09-29 17:21:48 UTC - yannick: it gets weirder and weirder ---- 2020-09-29 17:22:49 UTC - Addison Higham: hrm :confused: do you see anything in broker logs? ---- 2020-09-29 17:23:39 UTC - Addison Higham: I need to actually jump on a meeting, but if you want to file this as a github issue, I can also get someone investigating if this is an issue with v4. I see that jclouds does support v4 only regions, but it is possibly we might need to change something ---- 2020-09-29 17:25:47 UTC - Milos Matijasevic: thank you for your help! actually there was memory issue when endpoint doesn't contains https:// we will test now latest settings ---- 2020-09-29 17:25:57 UTC - yannick: i'll dig it down. when milos removed `https` prefis the pods got OOM killed due to heap issues. i love java ---- 2020-09-29 17:26:08 UTC - yannick: thanks a lot for your support addison ---- 2020-09-29 17:26:33 UTC - Addison Higham: apologies for the issues! and the https thing is very strange... ---- 2020-09-29 17:28:09 UTC - yannick: ohai, it picked up the endpoint finally ! ---- 2020-09-29 17:28:11 UTC - yannick: ```dev-pulsar-broker-1 dev-pulsar-broker Caused by: org.jclouds.aws.AWSResponseException: request POST <https://pulsar-dev.s3-eu-central-1.amazonaws.com/4ea1d3ce-aa87-423e-b8a9-82ebf29cab07-ledger-212?uploads> HTTP/1.1 failed with code 403, error: AWSError{requestId='61A203E2E736DB14', requestToken='+=', code='', message='Access Denied', context='{HostId=+c6ncUY72V1aa2RUzkS51U8HonKEqBBI=}'}``` ---- 2020-09-29 17:31:19 UTC - Addison Higham: ah I wonder if some of this might be the way statefulsets role and you picking up old configmaps ---- 2020-09-29 17:31:45 UTC - yannick: ok we made it work ---- 2020-09-29 17:32:54 UTC - yannick: partially my bad as i operated under the assumption the pod has access to the bucket (which it hadnt) mixed with missleading error logs ---- 2020-09-29 17:41:17 UTC - Evan Furman: Wanted to follow back up here. To simplify things, I was planning to disable user.management; however, it looks like the default credentials don’t have superuser permissions that allow the user to add a cluster. Is this intended? ---- 2020-09-29 17:41:32 UTC - Evan Furman: @Addison Higham ---- 2020-09-29 17:51:12 UTC - Addison Higham: I am not sure on this one, @Sijie Guo may have a better idea... +1 : Evan Furman ---- 2020-09-29 18:09:15 UTC - Evan Furman: ---- 2020-09-29 19:06:24 UTC - Lekan Adigun: @Lekan Adigun has joined the channel ---- 2020-09-29 21:35:13 UTC - vikash: Hi @Sijie Guo ,i am facing issues in creating apache pulsar fucntion using rest api post ---- 2020-09-29 21:35:28 UTC - vikash: <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 415 Unsupported Media Type</title> </head> <body> <h2>HTTP ERROR 415 Unsupported Media Type</h2> <table> <tr> <th>URI:</th> <td>/admin/v2/functions/9e7b08b5-3bfd-4026-b322-c4a34fffafd7/PulsarFunction/ProrationFactorCalculation</td> </tr> <tr> <th>STATUS:</th> <td>415</td> </tr> <tr> <th>MESSAGE:</th> <td>Unsupported Media Type</td> </tr> <tr> <th>SERVLET:</th> <td>org.glassfish.jersey.servlet.ServletContainer-53202b06</td> </tr> </table> <hr> <a href="<http://eclipse.org/jetty>">Powered by Jetty:// 9.4.29.v20200521</a> <hr/> </body> </html> ---- 2020-09-29 21:36:34 UTC - vikash: i am passing containType as application/json ---- 2020-09-29 21:39:27 UTC - Sijie Guo: > however, it looks like the default credentials don’t have superuser permissions that allow the user to add a cluster. Is this intended? Are you using token authentication in your Pulsar cluster? Did you config the super-user token in pulsar-manager side? ---- 2020-09-29 21:41:58 UTC - Evan Furman: the only change I made from the default `application.properties` was to set `user.management.enable` to `false` ---- 2020-09-29 21:42:21 UTC - Sijie Guo: Interesting. What is the ack timeout? What language of the client are you using? And what version is that client? ---- 2020-09-29 21:43:18 UTC - Sijie Guo: Is your Pulsar cluster protected with authentication/authorization? If so, what authentication mechanism do you use? ---- 2020-09-29 21:43:37 UTC - Evan Furman: Nope, we have not enabled it ---- 2020-09-29 21:44:40 UTC - Evan Furman: We may decide to do so downline but right now we are trying to keep things simple as we learn the system. ---- 2020-09-29 21:45:31 UTC - Evan Furman: In pulsar manager v1 the default credentials (pulsar/pulsar) were sufficient to perform all operations ---- 2020-09-29 21:51:21 UTC - Sijie Guo: okay. make sense. So you are able to login using pulsar/pulsar. But you are not able to create an environment. is that correct? ---- 2020-09-29 21:52:14 UTC - Evan Furman: Yep, exactly. I trying setting the default environment too but pulsar/pulsar doesn’t have access to see it ---- 2020-09-29 21:53:03 UTC - Evan Furman: I was just surprised to see that the defaults in v2 behave differently than in v1 ---- 2020-09-29 21:56:24 UTC - Sijie Guo: It shouldn’t. v2 only introduced user authentication. Let me loop @tuteng in to help you on this. ---- 2020-09-29 21:56:59 UTC - Evan Furman: ok perfect, appreciate the help. ---- 2020-09-30 00:52:18 UTC - Yunze Xu: @vikash There's something wrong with REST API's doc, creating functions requires multipart/form-data, maybe you've used the application/json? You can use pulsar-admin to create/update a function ---- 2020-09-30 02:13:45 UTC - dhineshkumar murugesan: @dhineshkumar murugesan has joined the channel ---- 2020-09-30 03:00:36 UTC - tuteng: we have integrated BookKeeper visual manager in the new version 0.2.0 <https://github.com/apache/pulsar-manager#enable-bookkeeper-visual-manageroptional>. You can try to use it. ---- 2020-09-30 03:04:27 UTC - tuteng: In the user login section, which image version do you use? I recently released apachepulsar/pulsar-manager:0.2.0, and apache pulsar/pulsar-manager will use version 0.2.0. ---- 2020-09-30 04:09:33 UTC - Luke Stephenson: I had exactly the same experience, if you use any region other than the default, any misconfiguration with credentials is incorrectly reported as being a region issue <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1591968316415200?thread_ts=1591924819.402800&cid=C5Z4T36F7> ---- 2020-09-30 06:06:57 UTC - charles: Hi all, On the <https://github.com/apache/pulsar/wiki/Client-Features-Matrix|Pulsar client feature matrix>, I noticed that the "*Effectively-Once*" feature is not available for the *WebSocket* client. The matrix seems updated till Pulsar *2.5.0*. Is there any chance this will be supported in the future? Or does anyone have experience how to make this happen in a different way? ---- 2020-09-30 06:33:10 UTC - Walter: @Sijie Guo But when i check the zookeeper health by executing "echo "ruok" | nc localhost 2181 ; echo". It's given result as "imok" ---- 2020-09-30 06:34:10 UTC - Walter: Due to this error broker are failing in health check ----
