2019-11-20 09:31:17 UTC - Fernando: where is the state of pulsar functions 
stored? I need to have the functionality of pulsar functions but the deployment 
method is not suitable for us (we ’d prefer to point pulsar to a container 
image and then it deploys such image as a statefulset rather than uploading the 
source code). As a workaround I’m thinking of implementing containerized Pulsar 
readers. However the difficulty is in keeping track of the state (latest read 
message). One approach is storing it in disk making the deployment a 
statefulset, another approach is to store the state in a particular topic thus 
avoiding any possible issues with deployment affinity. Any thoughts?
----
2019-11-20 09:34:47 UTC - Ali Ahmed: @Fernando the function state if using the 
kv api provided by the context in stored in bookkeeper via the table service 
api.
----
2019-11-20 10:08:00 UTC - Fernando: @Ali Ahmed so if I wanted to replicate the 
function state in a reader I would have to tap into the bookkeeper api directly
----
2019-11-20 10:15:58 UTC - Silence: @Silence has joined the channel
----
2019-11-20 10:27:00 UTC - Pedro Cardoso: @Fernando beware that the K/V store 
that ahmed is talking about is specific to each function, see 
<https://pulsar.apache.org/docs/en/functions-develop/#state-storage>

```States are key-value pairs, where the key is a string and the value is 
arbitrary binary data - counters are stored as 64-bit big-endian binary values. 
Keys are scoped to an individual Pulsar Function, and shared between instances 
of that function.```
----
2019-11-20 10:28:54 UTC - Fernando: @Pedro Cardoso Thanks for pointing it out. 
I guess I’d design a solution using another topic for storing the state of my 
reader.
----
2019-11-20 10:29:56 UTC - Pedro Cardoso: Is it a K/V state or something else? 
I'm facing a similar issue and thinking of connecting pulsar functions directly 
to a application-controlled bookkeeper ledger
----
2019-11-20 10:33:56 UTC - Fernando: yes K/V state
----
2019-11-20 10:34:58 UTC - Fernando: how do you plan to controll the bookkeeper 
ledger?
----
2019-11-20 10:35:20 UTC - Pedro Cardoso: Control in what way?
----
2019-11-20 10:50:42 UTC - Pedro Cardoso: <!here> Has anyone come across the 
following?
----
2019-11-20 10:51:21 UTC - Pedro Cardoso: `Size of data received by DoubleSchema 
is not 8`  when consuming a message from a pulsar function execution with the 
following signature
`public class RollingSum implements Function&lt;String, Double&gt;`
----
2019-11-20 10:52:40 UTC - Pedro Cardoso: the payload in the message is only 3 
bytes long: `[49,46,48]`
----
2019-11-20 10:53:00 UTC - Pedro Cardoso: but the pulsar function computes the 
expected result
----
2019-11-20 13:13:08 UTC - jun: @jun has joined the channel
----
2019-11-20 14:03:15 UTC - Antonios Pagidas: @Antonios Pagidas has joined the 
channel
----
2019-11-20 16:02:23 UTC - geal: I’m trying to write an authentication provider 
and an authorization provider, and it’s been mostly straightforward to do, but 
I’m a bit puzzled by the super user mechanism. In which cases is the 
`AuthorizationProvider.isSuperUser` method called? It seems that for some cases 
another part of the code will check if the result of 
`AuthenticationProvider.authenticate` is in the list of values provided by the 
`superUserRoles` key in the configuration. Is there a way to ask the authz 
provider instead?
----
2019-11-20 17:12:12 UTC - Matteo Merli: That's mostly for historical reasons, 
where the list of "super-users" (eg: admin, broker-to-broker, etc) is kept in 
the config file instead of being handled by the authz provider.

I think it should be possible to bring that into the authz provider, while 
still maintaining the same source of the config in the default provider.
----
2019-11-20 17:28:05 UTC - geal: yes, the default function gets the roles from 
the configuration as well: 
<https://github.com/apache/pulsar/blob/14d1eaa73e1479e403042da87ad34c7a35a304e2/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/authorization/AuthorizationProvider.java#L44-L47>
but the code that is currently checking the superuser role here 
<https://github.com/apache/pulsar/blob/a057a1430a186b6c874b9605a0c525daf3846900/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/BrokerDiscoveryProvider.java#L156-L158>
 apparently gets it from zookeeper?
Which one should be the main way to do it?
----
2019-11-20 17:28:19 UTC - geal: I’ll look into writing a patch for that
----
2019-11-20 17:33:43 UTC - Matteo Merli: Ok, so the thing is that there are 
actions that require "root" access and are not similar to "can publish on this 
topic".

Eg: creating a new tenant, or changing a system-related setting for a 
particular namespace
----
2019-11-20 18:02:43 UTC - Pedro Cardoso: If anyone has any experience with 
schemas in topics for pulsar functions and is available to help, please let me 
know. Thank you
----
2019-11-20 18:11:12 UTC - geal: yup, this makes sense (I think I got in that 
part of the code because the `canLookup` method of the authz provider 
incorrectly returned false).
Would it be useful to have access level limited to a tenant or a namespace 
though?
----
2019-11-20 18:12:39 UTC - geal: for some context, I’m integrating biscuit 
tokens: <https://github.com/clevercloud/biscuit>
a token with decentralized verification like JWT, and offline attenuation like 
macaroons. With it I can model very granular access levels without affecting 
the rest of pulsar
----
2019-11-20 18:16:09 UTC - Igor Zubchenok: &gt; New ensemble: [X.X.X.X:3181, 
X.X.X.X:3181] is not adhering to Placement Policy.
How to fix it?
----
2019-11-20 18:40:58 UTC - Sijie Guo: how did you submit the function? 
DoubleSchema should be only used for the serde for the output results.
----
2019-11-20 18:42:24 UTC - Pedro Cardoso: ```        // Create function
        final FunctionConfig functionConfig = FunctionConfig.builder()
                                                            .jar(path_to_jar)
                                                            
.className("RollingSum")
                                                            .name("rollingsum")
                                                            
.inputs(ImmutableList.of("<non-persistent://public/default/transaction-input>"))
                                                            
.output("<non-persistent://public/default/transaction-output>")
                                                            
.retainOrdering(true)
                                                            .tenant("public")
                                                            
.namespace("default")
                                                            .build();

        // Deploy it
        pulsar_admin.functions().createFunctionWithUrl(functionConfig, 
functionConfig.getJar());```
----
2019-11-20 18:43:48 UTC - Pedro Cardoso: When defining the consumer, if I 
define the schema as `Schema.JSON(Double.class)` it works. 
`Schema.Avro(Double.class)` or `Schema.DOUBLE` do not, is there any 
documentation stating the differences?
----
2019-11-20 18:44:26 UTC - Sijie Guo: the message is printed when you have less 
than 2 racks in the rack-aware placement policy.
----
2019-11-20 18:44:49 UTC - Sijie Guo: this is a warning message. you can ignore 
the message if you don’t have any rack information.
----
2019-11-20 18:45:25 UTC - Sijie Guo: If you want to get rid of that message, 
you can try to configure racks by using `bin/pulsar-admin bookies`
----
2019-11-20 18:46:59 UTC - Sijie Guo: when you say “defining the consumer”, are 
you referring the consumer for the output topic?
----
2019-11-20 18:57:59 UTC - Pedro Cardoso: yes
----
2019-11-20 18:58:12 UTC - Pedro Cardoso: ```final Consumer&lt;Double&gt; 
consumer = client.newConsumer(Schema.JSON(Double.class))
                                                
.topic("<non-persistent://public/default/transaction-output>")
                                                
.subscriptionName("consumer-subscription")
                                                .subscribe();```
----
2019-11-20 19:02:27 UTC - Sijie Guo: okay.

so by default, functions is using JSON for SerDe, unless you specify SerDe or 
SchemaType when you submit a function.

so if you already submitted a function without SerDe or SchemaType, please use 
JSON schema to consume the output topic.
----
2019-11-20 19:05:24 UTC - Pedro Cardoso: so that means I must either call 
`.outputSerdeClassName()`or `.outputSchemaType()` when defining my pulsar 
function? What is the difference between them?
----
2019-11-20 19:17:22 UTC - Sijie Guo: yes. correct.

schemaType is for the schema types supported by Pulsar; serdeClassName is used 
if you want to customize serialization for your data.
----
2019-11-20 19:19:40 UTC - Pedro Cardoso: If I define an Avro SchemaType will I 
have to make a Serde implementation of my data that matches Avro's 
serialization format?
----
2019-11-20 19:28:27 UTC - Sijie Guo: if you are using SchemaType.AVRO, it is 
using Pulsar’s AVRO serde. you don’t need to provide your own serde 
implementation.
----
2019-11-20 19:30:12 UTC - Pedro Cardoso: Thank you very much Sijeg, your help 
has been phenomenal!
----
2019-11-20 21:29:23 UTC - Derek Rhodehamel: Does anyone have examples of using 
the `simulation-controller` in `pulsar-perf`? I can connect to a cluster of 
`simulation-clients` but when I try to `trade_group` the clients create a bunch 
of consumers but no producers are created (at least none that I can see in the 
dashboard) and no traffic goes through. Is there a separate command for 
simulation producers?
----
2019-11-20 21:46:27 UTC - Nuno Ferreira: @Nuno Ferreira has joined the channel
----
2019-11-20 23:17:03 UTC - Jeff: @Jeff has joined the channel
----
2019-11-21 01:09:43 UTC - Luke Lu: Interesting. `pulsar-admin bookies` is not 
officially documented (<https://pulsar.apache.org/docs/en/pulsar-admin/>), but 
indeed available since 2.1.0: 
<https://github.com/apache/pulsar/blob/master/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdBookies.java>
----
2019-11-21 04:50:22 UTC - Igor Zubchenok: @Sijie Guo is there an option to just 
disable rack-aware placement policy?
----
2019-11-21 08:15:49 UTC - leonidv: Hi all! When the documentation mention "only 
for shared subscription mode" can I read this as "only for shared and shared by 
key subscription modes"?
----
2019-11-21 08:19:26 UTC - Sijie Guo: mostly it will be the case.
----
2019-11-21 08:26:10 UTC - leonidv: ok, thanks
----

Reply via email to