2020-04-09 14:09:29 UTC - Matteo Merli: In the AuthZ provider, you’re also being passed the authenticationData which let you access the original credentials ---- 2020-04-09 14:20:15 UTC - slouie: @slouie has joined the channel ---- 2020-04-09 14:29:07 UTC - Subash Kunjupillai: @Subash Kunjupillai has joined the channel ---- 2020-04-09 14:38:37 UTC - Ryan Slominski: Hi, I'm comparing Kafka Source Connector API vs Pulsar IO Source Connector. In Kafka there is a method (in class org.apache.kafka.connect.source.SourceConnector) that is used to divide up work for parallelism: ```public List<Map<String, String>> taskConfigs(int maxTasks)``` Is there something equivalent for Pulsar? I see the flag "--parallelism" for the "pulsar-admin sources create" command, but how do I specify how the work should be divided up? ---- 2020-04-09 15:07:54 UTC - simon gabet: @simon gabet has joined the channel wave : Damien ---- 2020-04-09 15:09:28 UTC - michael.colson: @michael.colson has joined the channel ---- 2020-04-09 15:50:06 UTC - Subash Kunjupillai: Hi, I'm new to Pub-Sub systems and currently comparing features of Kafka and Pulsar. I've started with simple producer and consumer and I notice a key difference in the consumer API. • The Consumer API of Kafka poll `ConsumerRecords<Integer, byte[]> records = consumer.poll(0);` gets all the unacknowledged messages available in the topic at that point of time. • But whereas the consumer API of Pulsar receive `msg = consumer.receive(0, TimeUnit.SECONDS);` gets only one unacknowledged message from the topic. I'm wondering is there any advantage seen in Pulsar by returning only one message at a time. Can someone please help in clarifying?
_P.S: I'm using the default subscription mode in consumer (Exclusive)_ ---- 2020-04-09 15:55:03 UTC - Sijie Guo: @Matteo Merli I think `isSuperUser` is an exception ---- 2020-04-09 15:55:24 UTC - Sijie Guo: we should make that consistent with other methods ---- 2020-04-09 15:58:29 UTC - Matteo Merli: Correct. We recently fixed the `isTenantAdmin()` but the `isSuperUser()` is still not taking that ---- 2020-04-09 15:58:46 UTC - Sijie Guo: SourceContext or SinkContext will return `getNumInstances` . It gives you the ideas about how many instances will be used for running the connector. ---- 2020-04-09 16:16:44 UTC - Raman Gupta: Firstly, that Kafka call doesn't get *all* unread messages, it gets an undefined amount of messages which might be all or might be less (there are a bunch of consumer properties that control this like `fetch.max.bytes` and `max.poll.records`. Secondly, AIUI the Pulsar consumer does the same thing in the background but presents a simpler one-receive-per-message API to you the user. See `receiverQueueSize` for <https://pulsar.apache.org/docs/en/client-libraries-java/#configure-consumer|configuring> this behavior. If you really want to receive a batch of messages all at once, you can use a <https://pulsar.apache.org/docs/en/client-libraries-java/#batch-receive|batching receive>. ---- 2020-04-09 16:27:54 UTC - Ryan Slominski: How is work divided? Say for a custom source with multiple "channels" to listen for changes on, how do I specify which channels go with which Pulsar Source Connector instance? ---- 2020-04-09 16:32:41 UTC - Sijie Guo: it mostly depends on the connector implementation. for a pulsar source connector, the work is divided based on the subscription type (the subscription type is inferred by the ordering guarantee). For other connectors, like I said, it depends on the connector implementation. ---- 2020-04-09 16:33:14 UTC - Sijie Guo: :+1: ---- 2020-04-09 16:33:31 UTC - Ebere Abanonu: Got it to work. The underlying issue was the zookeeper endpoint. Using the zookeeper service name as the endpoint got it working. I have also changed to pulsar-all in the presto-pulsar docker image. I have sent a PR for review 100 : Sijie Guo ---- 2020-04-09 16:33:43 UTC - Sijie Guo: correct. we need to fix that interface. Are you going to fix that? Or shall we do that? ---- 2020-04-09 16:34:03 UTC - Sijie Guo: Cool ---- 2020-04-09 16:37:38 UTC - Ryan Slominski: It sounds like Pulsar and Kafka are very different here as Kafka create "tasks" based on your configuration, and it sounds like with Pulsar the Source Connector must spawn it's own threads? ---- 2020-04-09 16:39:20 UTC - Ryan Slominski: I have a working connector in Kafka in which I specify with configuration a list of "channels" from my custom in-house pub/sub system and configure how many tasks to spread the load out on and Kafka handles it from there. The taskConfigs looks like: ``` public List<Map<String, String>> taskConfigs(int maxTasks) { List<Map<String, String>> configs = new ArrayList<>(); // Default case is same number of channels as tasks so each task has one int channelsPerTask = 1; int remainder = 0; if(channels.size() > maxTasks) { channelsPerTask = channelsPerTask / maxTasks; remainder = channelsPerTask % maxTasks; } else if(channels.size() < maxTasks) { maxTasks = channels.size(); // Reduce number of tasks as not enough work to go around! } List<String> all = new ArrayList<>(channels); int fromIndex = 0; int toIndex = channelsPerTask + remainder; // Always at least one - maxTasks ignored if < 1; Also first one takes remainder if(toIndex > 0) { appendSubsetList(configs, all, fromIndex, toIndex); } fromIndex = toIndex; toIndex = toIndex + channelsPerTask; for(int i = 1; i < maxTasks; i++) { appendSubsetList(configs, all, fromIndex, toIndex); fromIndex = toIndex; toIndex = toIndex + channelsPerTask; } return configs; }``` ---- 2020-04-09 16:55:27 UTC - Sijie Guo: No. Pulsar also creates instances based on “parallelism”. But how the actual work is divided in in the connector implementation. The connector implementation use `getNumInstances` to divide the work. ---- 2020-04-09 16:56:16 UTC - Sijie Guo: So Pulsar and Kafka are same. ---- 2020-04-09 16:57:32 UTC - Sijie Guo: You can do assign the tasks in implementing the `open` interface in Pulsar source and sink. ---- 2020-04-09 16:57:35 UTC - Sijie Guo: ```/** * Open connector with configuration. * * @param config initialization config * @param sourceContext environment where the source connector is running * @throws Exception IO type exceptions when opening a connector */ void open(final Map<String, Object> config, SourceContext sourceContext) throws Exception; ``` ---- 2020-04-09 16:58:10 UTC - Sijie Guo: In opening a connector, you can get the number of instances from the source context and calculate the portion of the task you are running. ---- 2020-04-09 17:03:09 UTC - Matteo Merli: Please go ahead ---- 2020-04-09 18:05:53 UTC - Alexander Ursu: Hi, I was wondering how I could make each zookeeper instance use less memory, as I've been having issues running it on a cloud vm with only 2gb. I've been getting this error: ```Native memory allocation (mmap) failed to map 2147483648 bytes for committing reserved memory``` So it's trying to allocate a little over 2gb, any way to bring with down to 1.5gb or less? ---- 2020-04-09 18:06:32 UTC - Matteo Merli: Take a look at `conf/pulsar_<http://env.sh|env.sh>`. There are the variable to set the JVM heap size ---- 2020-04-09 18:12:06 UTC - Alexander Ursu: thanks! ---- 2020-04-09 20:32:33 UTC - Sijie Guo: @Matteo Merli - <https://github.com/apache/pulsar/issues/6702> created the issue. ---- 2020-04-09 20:32:49 UTC - Matteo Merli: :ok_hand: ---- 2020-04-09 21:13:44 UTC - Alexander Ursu: I'm having trouble using the admin rest api through the proxy, I keep getting ```<html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 404 Not Found</title> </head> <body><h2>HTTP ERROR 404</h2> <p>Problem accessing /brokers/configuration. Reason: <pre> Not Found</pre></p><hr><a href="<http://eclipse.org/jetty>">Powered by Jetty:// 9.4.20.v20190813</a><hr/> </body> </html>``` for just about every endpoint except `/metrics/` ---- 2020-04-09 21:43:24 UTC - Sijie Guo: How do you setting up the proxy? ---- 2020-04-09 21:48:07 UTC - Khubaib Shabbir: @Khubaib Shabbir has joined the channel ---- 2020-04-09 22:10:07 UTC - Alexander Ursu: currently as a docker container, I've only set the few necessary env vars like so: ```brokerServiceURL: <pulsar://pulsar-broker:6650> brokerWebServiceURL: <http://pulsar-broker:8080> functionWorkerWebServiceURL: <http://pulsar-broker:8080> clusterName: pulsar-cluster-1``` ---- 2020-04-09 22:11:33 UTC - Alexander Ursu: the `pulsar-broker` name does resolve correctly. The proxy appears to work for regular pub-sub interactions ---- 2020-04-09 22:37:29 UTC - Sijie Guo: is the broker container running in the same environment as proxy container? ---- 2020-04-09 22:40:08 UTC - Ermir Zaimi: @Ermir Zaimi has joined the channel ---- 2020-04-09 22:59:22 UTC - Alexander Ursu: yes ---- 2020-04-09 23:53:11 UTC - roberth: @roberth has joined the channel ---- 2020-04-10 01:37:02 UTC - michael.colson: Hi, I'm starting to read about pulsar and the `multi tiering storage` is what brings me here. Indeed i would like to have some kind of "unbounded" lookup repository for my tracking events. (being able to follow actions and find the previous action of a user, a session, etc...) pulsar-sql and all "CEP like" articles make me feel that it is suitable for my use case. In a few word can i use pulsar as a store that i can query to enrich my main stream ? Leveraging on a tuned storage policy between bookie and s3 ? Let me know if it is not the right place to post that kind of messages. Thanks ---- 2020-04-10 03:40:23 UTC - Sijie Guo: Is the “query” a streaming query, scan-like query or point-lookup query? ---- 2020-04-10 07:47:50 UTC - hugues DESLANDES: Hi, I am trying to use Pulsar SQL. Is there a way to query on a compacted topic ? It seems the "reader" for sql is reading all messages, not only the compacted ones. I was not able to find any documentation about parameters or option such as in the reader : `.readCompacted(true)` Any suggestions ? Thanks in advance ---- 2020-04-10 07:48:05 UTC - jk: @jk has joined the channel ----
