Slack digest for #general - 2020-04-10

Apache Pulsar Slack Fri, 10 Apr 2020 02:11:26 -0700

2020-04-09 14:09:29 UTC - Matteo Merli: In the AuthZ provider, you’re also 
being passed the authenticationData which let you access the original 
credentials 
----
2020-04-09 14:20:15 UTC - slouie: @slouie has joined the channel
----
2020-04-09 14:29:07 UTC - Subash Kunjupillai: @Subash Kunjupillai has joined 
the channel
----
2020-04-09 14:38:37 UTC - Ryan Slominski: Hi, I'm comparing Kafka Source 
Connector API vs Pulsar IO Source Connector.   In Kafka there is a method (in 
class org.apache.kafka.connect.source.SourceConnector) that is used to divide 
up work for parallelism:
```public List&lt;Map&lt;String, String&gt;&gt; taskConfigs(int maxTasks)```
Is there something equivalent for Pulsar?    I see the flag "--parallelism" for 
the "pulsar-admin sources create" command, but how do I specify how the work 
should be divided up?
----
2020-04-09 15:07:54 UTC - simon gabet: @simon gabet has joined the channel
wave : Damien
----
2020-04-09 15:09:28 UTC - michael.colson: @michael.colson has joined the channel
----
2020-04-09 15:50:06 UTC - Subash Kunjupillai: Hi, I'm new to Pub-Sub systems 
and currently comparing features of Kafka and Pulsar. I've started with simple 
producer and consumer and I notice a key difference in the consumer API.
• The Consumer API of Kafka poll `ConsumerRecords&lt;Integer, byte[]&gt; 
records = consumer.poll(0);` gets all the unacknowledged messages available in 
the topic at that point of time.
• But whereas the consumer API of Pulsar receive `msg = consumer.receive(0, 
TimeUnit.SECONDS);` gets only one unacknowledged message from the topic. 
I'm wondering is there any advantage seen in Pulsar by returning only one 
message at a time. Can someone please help in clarifying?


_P.S: I'm using the default subscription mode in consumer (Exclusive)_
----
2020-04-09 15:55:03 UTC - Sijie Guo: @Matteo Merli I think `isSuperUser` is an 
exception
----
2020-04-09 15:55:24 UTC - Sijie Guo: we should make that consistent with other 
methods
----
2020-04-09 15:58:29 UTC - Matteo Merli: Correct. We recently fixed the 
`isTenantAdmin()` but the `isSuperUser()` is still not taking that
----
2020-04-09 15:58:46 UTC - Sijie Guo: SourceContext or SinkContext will return 
`getNumInstances` . It gives you the ideas about how many instances will be 
used for running the connector.
----
2020-04-09 16:16:44 UTC - Raman Gupta: Firstly, that Kafka call doesn't get 
*all* unread messages, it gets an undefined amount of messages which might be 
all or might be less (there are a bunch of consumer properties that control 
this like `fetch.max.bytes` and `max.poll.records`.

Secondly, AIUI the Pulsar consumer does the same thing in the background but 
presents a simpler one-receive-per-message API to you the user. See 
`receiverQueueSize` for 
<https://pulsar.apache.org/docs/en/client-libraries-java/#configure-consumer|configuring>
 this behavior.

If you really want to receive a batch of messages all at once, you can use a 
<https://pulsar.apache.org/docs/en/client-libraries-java/#batch-receive|batching
 receive>.
----
2020-04-09 16:27:54 UTC - Ryan Slominski: How is work divided?  Say for a 
custom source with multiple "channels" to listen for changes on, how do I 
specify which channels go with which Pulsar Source Connector instance?
----
2020-04-09 16:32:41 UTC - Sijie Guo: it mostly depends on the connector 
implementation. for a pulsar source connector, the work is divided based on the 
subscription type (the subscription type is inferred by the ordering 
guarantee). For other connectors, like I said, it depends on the connector 
implementation.
----
2020-04-09 16:33:14 UTC - Sijie Guo: :+1:
----
2020-04-09 16:33:31 UTC - Ebere Abanonu: Got it to work. The underlying issue 
was the zookeeper endpoint. Using the zookeeper service name as  the endpoint 
got it working. I have also changed to pulsar-all in the presto-pulsar docker 
image. I have sent a PR for review
100 : Sijie Guo
----
2020-04-09 16:33:43 UTC - Sijie Guo: correct. we need to fix that interface. 
Are you going to fix that? Or shall we do that?
----
2020-04-09 16:34:03 UTC - Sijie Guo: Cool
----
2020-04-09 16:37:38 UTC - Ryan Slominski: It sounds like Pulsar and Kafka are 
very different here as Kafka create "tasks" based on your configuration, and it 
sounds like with Pulsar the Source Connector must spawn it's own threads?
----
2020-04-09 16:39:20 UTC - Ryan Slominski: I have a working connector in Kafka 
in which I specify with configuration a list of "channels" from my custom 
in-house pub/sub system and configure how many tasks to spread the load out on 
and Kafka handles it from there.  The taskConfigs looks like:

```
    public List&lt;Map&lt;String, String&gt;&gt; taskConfigs(int maxTasks) {
        List&lt;Map&lt;String, String&gt;&gt; configs = new ArrayList&lt;&gt;();

        // Default case is same number of channels as tasks so each task has one
        int channelsPerTask = 1;
        int remainder = 0;

        if(channels.size() &gt; maxTasks) {
            channelsPerTask = channelsPerTask / maxTasks;
            remainder = channelsPerTask % maxTasks;
        } else if(channels.size() &lt; maxTasks) {
            maxTasks = channels.size(); // Reduce number of tasks as not enough 
work to go around!
        }

        List&lt;String&gt; all = new ArrayList&lt;&gt;(channels);

        int fromIndex = 0;
        int toIndex = channelsPerTask + remainder;

        // Always at least one - maxTasks ignored if &lt; 1;  Also first one 
takes remainder
        if(toIndex &gt; 0) {
            appendSubsetList(configs, all, fromIndex, toIndex);
        }

        fromIndex = toIndex;
        toIndex = toIndex + channelsPerTask;

        for(int i = 1; i &lt; maxTasks; i++) {
            appendSubsetList(configs, all, fromIndex, toIndex);

            fromIndex = toIndex;
            toIndex = toIndex + channelsPerTask;
        }

        return configs;
    }```

----
2020-04-09 16:55:27 UTC - Sijie Guo: No. Pulsar also creates instances based on 
“parallelism”. But how the actual work is divided in in the connector 
implementation. The connector implementation use `getNumInstances` to divide 
the work.
----
2020-04-09 16:56:16 UTC - Sijie Guo: So Pulsar and Kafka are same.
----
2020-04-09 16:57:32 UTC - Sijie Guo: You can do assign the tasks in 
implementing the `open` interface in Pulsar source and sink.
----
2020-04-09 16:57:35 UTC - Sijie Guo: ```/**
     * Open connector with configuration.
     *
     * @param config initialization config
     * @param sourceContext environment where the source connector is running
     * @throws Exception IO type exceptions when opening a connector
     */
    void open(final Map&lt;String, Object&gt; config, SourceContext 
sourceContext) throws Exception;

    ```
----
2020-04-09 16:58:10 UTC - Sijie Guo: In opening a connector, you can get the 
number of instances from the source context and calculate the portion of the 
task you are running.
----
2020-04-09 17:03:09 UTC - Matteo Merli: Please go ahead
----
2020-04-09 18:05:53 UTC - Alexander Ursu: Hi, I was wondering how I could make 
each zookeeper instance use less memory, as I've been having issues running it 
on a cloud vm with only 2gb. I've been getting this error:
```Native memory allocation (mmap) failed to map 2147483648 bytes for 
committing reserved memory```
So it's trying to allocate a little over 2gb, any way to bring with down to 
1.5gb or less?
----
2020-04-09 18:06:32 UTC - Matteo Merli: Take a look at 
`conf/pulsar_<http://env.sh|env.sh>`. There are the variable to set the JVM 
heap size
----
2020-04-09 18:12:06 UTC - Alexander Ursu: thanks!
----
2020-04-09 20:32:33 UTC - Sijie Guo: @Matteo Merli - 
<https://github.com/apache/pulsar/issues/6702> created the issue.
----
2020-04-09 20:32:49 UTC - Matteo Merli: :ok_hand:
----
2020-04-09 21:13:44 UTC - Alexander Ursu: I'm having trouble using the admin 
rest api through the proxy, I keep getting
```&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=utf-8"/&gt;
&lt;title&gt;Error 404 Not Found&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;&lt;h2&gt;HTTP ERROR 404&lt;/h2&gt;
&lt;p&gt;Problem accessing /brokers/configuration. Reason:
&lt;pre&gt;    Not Found&lt;/pre&gt;&lt;/p&gt;&lt;hr&gt;&lt;a 
href="<http://eclipse.org/jetty>"&gt;Powered by Jetty:// 
9.4.20.v20190813&lt;/a&gt;&lt;hr/&gt;

&lt;/body&gt;
&lt;/html&gt;```
for just about every endpoint except `/metrics/`
----
2020-04-09 21:43:24 UTC - Sijie Guo: How do you setting up the proxy?
----
2020-04-09 21:48:07 UTC - Khubaib Shabbir: @Khubaib Shabbir has joined the 
channel
----
2020-04-09 22:10:07 UTC - Alexander Ursu: currently as a docker container, I've 
only set the few necessary env vars like so:
```brokerServiceURL: <pulsar://pulsar-broker:6650>

brokerWebServiceURL: <http://pulsar-broker:8080>

functionWorkerWebServiceURL: <http://pulsar-broker:8080>

clusterName: pulsar-cluster-1```
----
2020-04-09 22:11:33 UTC - Alexander Ursu: the `pulsar-broker` name does resolve 
correctly. The proxy appears to work for regular pub-sub interactions
----
2020-04-09 22:37:29 UTC - Sijie Guo: is the broker container running in the 
same environment as proxy container?
----
2020-04-09 22:40:08 UTC - Ermir Zaimi: @Ermir Zaimi has joined the channel
----
2020-04-09 22:59:22 UTC - Alexander Ursu: yes
----
2020-04-09 23:53:11 UTC - roberth: @roberth has joined the channel
----
2020-04-10 01:37:02 UTC - michael.colson: Hi,
I'm starting to read about pulsar and the `multi tiering storage` is what 
brings me here.
Indeed i would like to have some kind of "unbounded" lookup repository for my 
tracking events. (being able to follow actions and find the previous action of 
a user, a session, etc...)
pulsar-sql and all "CEP like" articles make me feel that it is suitable for my 
use case.
In a few word can i use pulsar as a store that i can query to enrich my main 
stream ? Leveraging on a tuned storage policy between bookie and s3 ?
Let me know if it is not the right place to post that kind of messages.
Thanks
----
2020-04-10 03:40:23 UTC - Sijie Guo: Is the “query” a streaming query, 
scan-like query or point-lookup query?
----
2020-04-10 07:47:50 UTC - hugues DESLANDES: Hi, I am trying to use Pulsar SQL. 
Is there a way to query on a compacted topic ?  It seems the "reader" for sql 
is reading all messages, not only the compacted ones. I was not able to find 
any documentation about parameters or option such as in the reader :
`.readCompacted(true)`
 Any suggestions ?
Thanks in advance
----
2020-04-10 07:48:05 UTC - jk: @jk has joined the channel
----

Slack digest for #general - 2020-04-10

Reply via email to