2019-11-20 09:31:17 UTC - Fernando: where is the state of pulsar functions stored? I need to have the functionality of pulsar functions but the deployment method is not suitable for us (we ’d prefer to point pulsar to a container image and then it deploys such image as a statefulset rather than uploading the source code). As a workaround I’m thinking of implementing containerized Pulsar readers. However the difficulty is in keeping track of the state (latest read message). One approach is storing it in disk making the deployment a statefulset, another approach is to store the state in a particular topic thus avoiding any possible issues with deployment affinity. Any thoughts? ---- 2019-11-20 09:34:47 UTC - Ali Ahmed: @Fernando the function state if using the kv api provided by the context in stored in bookkeeper via the table service api. ---- 2019-11-20 10:08:00 UTC - Fernando: @Ali Ahmed so if I wanted to replicate the function state in a reader I would have to tap into the bookkeeper api directly ---- 2019-11-20 10:15:58 UTC - Silence: @Silence has joined the channel ---- 2019-11-20 10:27:00 UTC - Pedro Cardoso: @Fernando beware that the K/V store that ahmed is talking about is specific to each function, see <https://pulsar.apache.org/docs/en/functions-develop/#state-storage>
```States are key-value pairs, where the key is a string and the value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual Pulsar Function, and shared between instances of that function.``` ---- 2019-11-20 10:28:54 UTC - Fernando: @Pedro Cardoso Thanks for pointing it out. I guess I’d design a solution using another topic for storing the state of my reader. ---- 2019-11-20 10:29:56 UTC - Pedro Cardoso: Is it a K/V state or something else? I'm facing a similar issue and thinking of connecting pulsar functions directly to a application-controlled bookkeeper ledger ---- 2019-11-20 10:33:56 UTC - Fernando: yes K/V state ---- 2019-11-20 10:34:58 UTC - Fernando: how do you plan to controll the bookkeeper ledger? ---- 2019-11-20 10:35:20 UTC - Pedro Cardoso: Control in what way? ---- 2019-11-20 10:50:42 UTC - Pedro Cardoso: <!here> Has anyone come across the following? ---- 2019-11-20 10:51:21 UTC - Pedro Cardoso: `Size of data received by DoubleSchema is not 8` when consuming a message from a pulsar function execution with the following signature `public class RollingSum implements Function<String, Double>` ---- 2019-11-20 10:52:40 UTC - Pedro Cardoso: the payload in the message is only 3 bytes long: `[49,46,48]` ---- 2019-11-20 10:53:00 UTC - Pedro Cardoso: but the pulsar function computes the expected result ---- 2019-11-20 13:13:08 UTC - jun: @jun has joined the channel ---- 2019-11-20 14:03:15 UTC - Antonios Pagidas: @Antonios Pagidas has joined the channel ---- 2019-11-20 16:02:23 UTC - geal: I’m trying to write an authentication provider and an authorization provider, and it’s been mostly straightforward to do, but I’m a bit puzzled by the super user mechanism. In which cases is the `AuthorizationProvider.isSuperUser` method called? It seems that for some cases another part of the code will check if the result of `AuthenticationProvider.authenticate` is in the list of values provided by the `superUserRoles` key in the configuration. Is there a way to ask the authz provider instead? ---- 2019-11-20 17:12:12 UTC - Matteo Merli: That's mostly for historical reasons, where the list of "super-users" (eg: admin, broker-to-broker, etc) is kept in the config file instead of being handled by the authz provider. I think it should be possible to bring that into the authz provider, while still maintaining the same source of the config in the default provider. ---- 2019-11-20 17:28:05 UTC - geal: yes, the default function gets the roles from the configuration as well: <https://github.com/apache/pulsar/blob/14d1eaa73e1479e403042da87ad34c7a35a304e2/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/authorization/AuthorizationProvider.java#L44-L47> but the code that is currently checking the superuser role here <https://github.com/apache/pulsar/blob/a057a1430a186b6c874b9605a0c525daf3846900/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/BrokerDiscoveryProvider.java#L156-L158> apparently gets it from zookeeper? Which one should be the main way to do it? ---- 2019-11-20 17:28:19 UTC - geal: I’ll look into writing a patch for that ---- 2019-11-20 17:33:43 UTC - Matteo Merli: Ok, so the thing is that there are actions that require "root" access and are not similar to "can publish on this topic". Eg: creating a new tenant, or changing a system-related setting for a particular namespace ---- 2019-11-20 18:02:43 UTC - Pedro Cardoso: If anyone has any experience with schemas in topics for pulsar functions and is available to help, please let me know. Thank you ---- 2019-11-20 18:11:12 UTC - geal: yup, this makes sense (I think I got in that part of the code because the `canLookup` method of the authz provider incorrectly returned false). Would it be useful to have access level limited to a tenant or a namespace though? ---- 2019-11-20 18:12:39 UTC - geal: for some context, I’m integrating biscuit tokens: <https://github.com/clevercloud/biscuit> a token with decentralized verification like JWT, and offline attenuation like macaroons. With it I can model very granular access levels without affecting the rest of pulsar ---- 2019-11-20 18:16:09 UTC - Igor Zubchenok: > New ensemble: [X.X.X.X:3181, X.X.X.X:3181] is not adhering to Placement Policy. How to fix it? ---- 2019-11-20 18:40:58 UTC - Sijie Guo: how did you submit the function? DoubleSchema should be only used for the serde for the output results. ---- 2019-11-20 18:42:24 UTC - Pedro Cardoso: ``` // Create function final FunctionConfig functionConfig = FunctionConfig.builder() .jar(path_to_jar) .className("RollingSum") .name("rollingsum") .inputs(ImmutableList.of("<non-persistent://public/default/transaction-input>")) .output("<non-persistent://public/default/transaction-output>") .retainOrdering(true) .tenant("public") .namespace("default") .build(); // Deploy it pulsar_admin.functions().createFunctionWithUrl(functionConfig, functionConfig.getJar());``` ---- 2019-11-20 18:43:48 UTC - Pedro Cardoso: When defining the consumer, if I define the schema as `Schema.JSON(Double.class)` it works. `Schema.Avro(Double.class)` or `Schema.DOUBLE` do not, is there any documentation stating the differences? ---- 2019-11-20 18:44:26 UTC - Sijie Guo: the message is printed when you have less than 2 racks in the rack-aware placement policy. ---- 2019-11-20 18:44:49 UTC - Sijie Guo: this is a warning message. you can ignore the message if you don’t have any rack information. ---- 2019-11-20 18:45:25 UTC - Sijie Guo: If you want to get rid of that message, you can try to configure racks by using `bin/pulsar-admin bookies` ---- 2019-11-20 18:46:59 UTC - Sijie Guo: when you say “defining the consumer”, are you referring the consumer for the output topic? ---- 2019-11-20 18:57:59 UTC - Pedro Cardoso: yes ---- 2019-11-20 18:58:12 UTC - Pedro Cardoso: ```final Consumer<Double> consumer = client.newConsumer(Schema.JSON(Double.class)) .topic("<non-persistent://public/default/transaction-output>") .subscriptionName("consumer-subscription") .subscribe();``` ---- 2019-11-20 19:02:27 UTC - Sijie Guo: okay. so by default, functions is using JSON for SerDe, unless you specify SerDe or SchemaType when you submit a function. so if you already submitted a function without SerDe or SchemaType, please use JSON schema to consume the output topic. ---- 2019-11-20 19:05:24 UTC - Pedro Cardoso: so that means I must either call `.outputSerdeClassName()`or `.outputSchemaType()` when defining my pulsar function? What is the difference between them? ---- 2019-11-20 19:17:22 UTC - Sijie Guo: yes. correct. schemaType is for the schema types supported by Pulsar; serdeClassName is used if you want to customize serialization for your data. ---- 2019-11-20 19:19:40 UTC - Pedro Cardoso: If I define an Avro SchemaType will I have to make a Serde implementation of my data that matches Avro's serialization format? ---- 2019-11-20 19:28:27 UTC - Sijie Guo: if you are using SchemaType.AVRO, it is using Pulsar’s AVRO serde. you don’t need to provide your own serde implementation. ---- 2019-11-20 19:30:12 UTC - Pedro Cardoso: Thank you very much Sijeg, your help has been phenomenal! ---- 2019-11-20 21:29:23 UTC - Derek Rhodehamel: Does anyone have examples of using the `simulation-controller` in `pulsar-perf`? I can connect to a cluster of `simulation-clients` but when I try to `trade_group` the clients create a bunch of consumers but no producers are created (at least none that I can see in the dashboard) and no traffic goes through. Is there a separate command for simulation producers? ---- 2019-11-20 21:46:27 UTC - Nuno Ferreira: @Nuno Ferreira has joined the channel ---- 2019-11-20 23:17:03 UTC - Jeff: @Jeff has joined the channel ---- 2019-11-21 01:09:43 UTC - Luke Lu: Interesting. `pulsar-admin bookies` is not officially documented (<https://pulsar.apache.org/docs/en/pulsar-admin/>), but indeed available since 2.1.0: <https://github.com/apache/pulsar/blob/master/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdBookies.java> ---- 2019-11-21 04:50:22 UTC - Igor Zubchenok: @Sijie Guo is there an option to just disable rack-aware placement policy? ---- 2019-11-21 08:15:49 UTC - leonidv: Hi all! When the documentation mention "only for shared subscription mode" can I read this as "only for shared and shared by key subscription modes"? ---- 2019-11-21 08:19:26 UTC - Sijie Guo: mostly it will be the case. ---- 2019-11-21 08:26:10 UTC - leonidv: ok, thanks ----
