Slack digest for #general - 2019-02-19

Apache Pulsar Slack Tue, 19 Feb 2019 01:11:49 -0800

2019-02-18 09:16:22 UTC - Marc Le Labourier: @Marc Le Labourier has joined the 
channel
----
2019-02-18 10:07:43 UTC - Christophe Bornet: Hi. When using a "Global 
namespace" with geo-replication, can we use the global ZooKeeper for all the 
clusters ? Or is it recommended to have a local ZooKeeper cluster for each 
Pulsar cluster ?
----
2019-02-18 10:51:19 UTC - bossbaby: I'm having an authentication problem in 
pulsar. When using tls, pulsar-client-cpp will be reconnect when there are a 
large number of messages sent
Is this a bug?
----
2019-02-18 14:09:10 UTC - Sijie Guo: “global zookeeper” here is just a 
configuration store. it technically is not required.


although you can still use global zookeeper for all clusters with proper 
chroot. however  in this case, global zookeeper can be a failure point to all 
clusters, it is not recommended.
----
2019-02-18 14:10:09 UTC - Sijie Guo: can you describe the setup on how this 
happened, so it can help the community to understand the problem and reproduce 
it if possible.
----
2019-02-18 14:27:23 UTC - Laurent Chriqui: So I tried again this morning. 
Here’s what I’m to create the function :
----
2019-02-18 14:27:37 UTC - Laurent Chriqui: Here are the logs I get :
----
2019-02-18 14:28:03 UTC - Laurent Chriqui: 
----
2019-02-18 14:30:21 UTC - Laurent Chriqui: When I look at the code in utils.py 
I see that only the directory of the file is is added in the sys.path (here: 
‘/tmp/pulsar_functions/public/default/RoutingFunction/0’ instead of 
‘/tmp/pulsar_functions/public/default/RoutingFunction/0/pulsarfunction.zip’)
----
2019-02-18 16:04:37 UTC - Laurent Chriqui: @Ali Ahmed do you see what the 
problem might be ?
----
2019-02-18 16:27:16 UTC - Christophe Bornet: &gt; it technically is not 
required.

Do you mean that each cluster can use its own local ZK for config store ?
----
2019-02-18 18:42:49 UTC - Boyan: @Boyan has joined the channel
----
2019-02-18 18:46:47 UTC - Boyan: Hey guys, I stumbled upon the single DNS part 
on ( <https://pulsar.apache.org/docs/latest/deployment/cluster/> ). I'm a bit 
perplexed. Is the suggested solution really DNS load balancing or am I 
interpreting what the document says incorrectly?
----
2019-02-18 18:51:00 UTC - Boyan: To be clear, I'm a bit uncertain on how the 
pulsar load balancing happens. Especially in comparison to kafka or nats.
----
2019-02-18 19:59:46 UTC - Matteo Merli: The DNS is just for service discovery 
----
2019-02-18 20:00:54 UTC - Matteo Merli: For load balancing info, check out 
<https://pulsar.apache.org/docs/en/administration-load-distribution/>
----
2019-02-18 20:07:30 UTC - Boyan: Then I'm even more confused. The line in the 
docs says: "A single DNS name covering all of the Pulsar broker hosts". I see 
that in the example  the zk hosts are separate names, and the claim of internal 
load balancing on the link you mention(which is what I'd expect) means that I'm 
really missing  the part of "why does it matter if the dns points to 1 or to 
all nodes?"
----
2019-02-18 20:07:34 UTC - Boyan: (dns failover excluded)
----
2019-02-18 20:08:00 UTC - Boyan: Is it an internal detail, that nodes need to 
be assigned the same dns when  the reverse to be considered part of the same 
cluster or is it somethign else that I am obvioously missing?
----
2019-02-18 20:09:41 UTC - Boyan: (I may have overstated, I'm definitely not 
more confused, the docs for load balancing are excellent. It's just the service 
discovery part that I'm strugglign with right now)
----
2019-02-18 22:00:28 UTC - Jacob O'Farrell: Hi, Are there any examples of 
running Pulsar SQL (and the associated Presto parts etc) on a Kubernetes 
cluster? Any suggestions as to the best place to start looking?
----
2019-02-18 22:02:47 UTC - Ali Ahmed: @Jacob O'Farrell do yo have pulsar running 
in k8 already ?
----
2019-02-18 22:03:01 UTC - Jacob O'Farrell: Yep - I have Pulsar running on k8s
----
2019-02-18 22:03:28 UTC - Jacob O'Farrell: Most of the docs around Pulsar SQL 
seem to be geared more towards non-k8s deployments (or I'm looking in the wrong 
place / misunderstanding)
----
2019-02-18 22:06:14 UTC - Ali Ahmed: you are right we are lacking some 
documentation in that area
----
2019-02-18 22:06:44 UTC - Ali Ahmed: but it should be straight forward presto 
is already part of the image so you you just need another service
----
2019-02-18 22:12:11 UTC - Jacob O'Farrell: Okay so if I've understood 
correctly, I should be able to create a new deplyoment/service based off the 
pulsar image, and just change the command to start to `bin/pulsar sql-worker 
run` (e.g. changing from bin/pulsar broker)
----
2019-02-18 22:14:35 UTC - Jacob O'Farrell: Is there a config etc that I need to 
modify / create?
----
2019-02-18 22:19:28 UTC - Ali Ahmed: depends on the cluster setup
but details are here
<https://pulsar.apache.org/docs/en/sql-deployment-configurations/>
----
2019-02-18 22:33:35 UTC - Jacob O'Farrell: Brokers generate/expose their config 
through this method (I believe) - any recommendations for how to achieve 
something similar with the sql-workers? 
<https://gyazo.com/90b22af3748678d6e6bf5aa00b76fd61>
----
2019-02-18 22:40:01 UTC - Ali Ahmed: it should work in a similar way
----
2019-02-18 22:40:39 UTC - Ali Ahmed: you would something like this
```bin/apply-config-from-env.py conf/presto/catalog/pulsar.properties ```
----
2019-02-18 22:43:52 UTC - Chris Martis: @Chris Martis has joined the channel
----
2019-02-18 22:43:59 UTC - Jacob O'Farrell: thanks @Ali Ahmed!

@Sijie Guo you mentioned yesterday that in 2.3 the kubernetes 
deployments/workers were getting some love - any changes on the horizon for the 
SQL related workers as well? (Additionally - any insight/additions as to the 
best way to get this up and going?)
----
2019-02-18 22:50:15 UTC - Alexandre DUVAL: @Alexandre DUVAL has joined the 
channel
----
2019-02-18 23:03:34 UTC - Alexandre DUVAL: Hi,
I think it's the right place to ask this kind of question (I read a lot of 
things about pulsar, but I still have a question).
What is the best solution? Have a lot of topics (so a lot of 
producers/consumers because the topic_id to pub/sub is defined on the 
producer/consumer and not on the record)? Or a huge topic with a lot of filters?
Consider this phrase, I'm maybe wrong. I explain myself, the pulsar client is 
up when we create the producers/consumers, right? So maybe it is not a problem 
to have let's say 10 000 producers and foreach of them, 10msg/s. Let's say I 
need pulsar to put logs from my applications and I have a lot of applications 
with a lot of logs, the best way is to have one topic per application? Maybe 
filtering by messages' properties is a good way. I need your advices.
I'm possibly globally wrong. So please do not hesitate to give me your opinion 
:slightly_smiling_face:.
----
2019-02-18 23:06:27 UTC - Alexandre DUVAL: + we can only read logs from topics 
without consume them. Or maybe a topic (or parititon?) to consome and one to 
read. So I'll have livelogs by consuming and all the logs by reading?
----
2019-02-18 23:29:22 UTC - Jacob O'Farrell: Apologies for the noise! I'm finding 
that in our K8s pulsar setup, we don't have any of the built in connectors 
available. How would I go about installing these? Should they be installed on 
the brokers? I'm trying to make sense of 
<http://pulsar.apache.org/docs/en/io-quickstart/#installing-builtin-connectors> 
in the context of a K8s installation and struggling a bit.

Happy to do my best to contribute back any changes to the docs that I can make 
off the back of the great help I've received in this channel.
----
2019-02-18 23:30:15 UTC - Ali Ahmed: @Jacob O'Farrell are you using master or 
2.2.1 ?
----
2019-02-18 23:32:23 UTC - Jacob O'Farrell: Our deployments are set to use the 
`apachepulsar/pulsar:latest` image tag in their deployments (as per the 
examples) - is this correct?
----
2019-02-18 23:35:32 UTC - Jacob O'Farrell: Sorry!! Just saw this section 
<https://gyazo.com/b127f4d5345000ff1aa9bfecda8d3376>
----
2019-02-18 23:35:35 UTC - Jacob O'Farrell: Would this fix my issue?
----
2019-02-18 23:37:26 UTC - Ali Ahmed: pulsar-all should do it
----
2019-02-18 23:41:13 UTC - Jacob O'Farrell: Awesome. Thank you @Ali Ahmed! 
Really appreciate the help. Sorry for all the noise!
----
2019-02-19 01:29:08 UTC - Sijie Guo: @Jacob O'Farrell I am not aware anyone has 
done that yet. but since all the startup scripts are there, should be pretty 
straightforward to add one.
----
2019-02-19 01:32:52 UTC - Sijie Guo: It is recommended to have one single DNS 
or load balancer in front of pulsar brokers. so when you configure your clients 
to pulsar, you don’t need to configure a list of addresses of brokers.

It doesn’t have to point to all. You can point to one or a few. it is just 
acting as the entrypoint for discovering more brokers.
----
2019-02-19 01:33:13 UTC - Jacob O'Farrell: In terms of the settings listed in: 
<https://github.com/apache/pulsar/blob/master/conf/presto/config.properties>

Do we need to set/change `node.id`, `node.environment` and `presto.version`- 
these seem to be set to placeholders/test values? Unsure what it is expecting 
here  - sorry if this is a silly question
----
2019-02-19 01:42:18 UTC - Sijie Guo: all these settings are kind of presto 
related. based on my knowledge so far,

node.id is used as an UUID. so you can generate an uuid for a worker when you 
started it.

`node.environment` is used for distinguishing your environment. you can 
configure it as how your organize your clusters. e.g. staging, production.

`presto.version` is used for advertising what version of presto you are 
running. it is also informative. so you can set to the version you are running.
----
2019-02-19 03:03:44 UTC - Jacob O'Farrell: ```node.id is used as an UUID. so 
you can generate an uuid for a worker when you started it.```

Will this be auto generated if not specified? 
<http://pulsar.apache.org/docs/en/next/sql-deployment-configurations/#deploying-to-a-3-node-cluster>
 does mention it as required in the config file so just trying to wrap my head 
around it all! Sorry
----
2019-02-19 03:04:52 UTC - bossbaby: I installed according to the instructions 
<https://pulsar.apache.org/docs/en/security-tls-transport/>
In pulsar-client-cpp:
```
pulsar::ClientConfiguration config;
    
    config.setUseTls(true);
    
config.setTlsTrustCertsFilePath("/Users/pro/Desktop/Apache_Pulsar/apache-pulsar-2.2.1_only_authen/my-ca/certs/ca.cert.pem");
    config.setTlsAllowInsecureConnection(false);
    Client client("<pulsar+ssl://localhost:6651/>", config);
```
 And after 1 time produce &amp; consuming continuous messages, I get Schedule 
reconnection message:
----
2019-02-19 03:12:32 UTC - bossbaby: producer log:
<https://gist.github.com/tuan6956/088eeb7cd6971453867fbe55571d9786>
consumer log:
<https://gist.github.com/tuan6956/ce69d9188c7ecb45e8d054603162d7f0>
----
2019-02-19 03:14:16 UTC - bossbaby: The question is why pulsar does not keep 
connected?
----
2019-02-19 03:31:11 UTC - Jacob O'Farrell: Any suggestions as to a good 
starting point for PULSAR_MEM settings for the SQL workers ?
----
2019-02-19 03:46:31 UTC - Jacob O'Farrell: Answered my own question - found 
them here <https://github.com/apache/pulsar/blob/master/conf/presto/jvm.config>
----
2019-02-19 05:05:11 UTC - Jacob O'Farrell: @Ali Ahmed @Sijie Guo We've got the 
config map setup, and have this defined in the args section
```                args:
                  - &gt;
                    bin/apply-config-from-env.py 
conf/presto/catalog/pulsar.properties &amp;&amp;
                    bin/apply-config-from-env.py conf/presto/config.properties 
&amp;&amp;
                    bin/pulsar sql-worker run```

However we don't see it writing the config to the specified files. *BUT* if we 
exec into the pod and run the command, it writes to the files/runs as expect... 
Any thoughts?
----
2019-02-19 05:07:17 UTC - Ali Ahmed: did you set the env variables correctly ?
----
2019-02-19 05:10:36 UTC - Jacob O'Farrell: I believe so? (is there a way to 
tell?) If we exec into the container, we can see that 
`conf/presto/catalog/pulsar.properties` is left as default, however if we run 
`bin/apply-config-from-env.py conf/presto/catalog/pulsar.properties` I can see 
it correct reflect the environment variables of the container in the 
`pulsar.properties` file
----
2019-02-19 05:12:20 UTC - Jacob O'Farrell: Very strange / not what I would 
expect
----
2019-02-19 05:35:49 UTC - Jacob O'Farrell: Rather puzzled, if you have any 
suggestions I'm all ears
----
2019-02-19 05:37:24 UTC - Ali Ahmed: can you show the whole yaml
----
2019-02-19 05:48:29 UTC - Jacob O'Farrell: 
----
2019-02-19 05:52:38 UTC - tianyu.xing: @tianyu.xing has joined the channel
----
2019-02-19 07:46:27 UTC - Khoa Tran: Hey all - trying to follow the steps in 
<https://pulsar.apache.org/docs/en/sql-getting-started/> however when I’m 
trying to query anything more than listing the catalogs I’m running into errors 
straight away, with the presto commands printing either ” ` Failed to get 
schemas from pulsar: Unexpected end of file from server` or ```presto&gt; show 
tables in pulsar."public/default";
Query is gone (server restarted?)```

Any suggestions would be appreciated!
----
2019-02-19 07:47:25 UTC - Khoa Tran: For reference, here are some of the error 
logs I’ve been able to retrieve from the coordinator (currently the only presto 
worker/node up)
----
2019-02-19 07:48:17 UTC - Khoa Tran: 
----
2019-02-19 07:48:24 UTC - Ali Ahmed: @Khoa Tran are you running a standalone 
server
----
2019-02-19 07:48:57 UTC - Khoa Tran: 
----
2019-02-19 07:49:48 UTC - Khoa Tran: @Ali Ahmed This is on a cluster 
configuration I believe - does Presto behave differently on standalone?
----
2019-02-19 07:50:54 UTC - Ali Ahmed: I don’t know the health of the cluster
----
2019-02-19 07:53:06 UTC - Khoa Tran: I don’t quite understand, however I can 
run `show catalogs` and have results returned e.g.
----
2019-02-19 07:53:12 UTC - Khoa Tran: ```presto&gt; show catalogs;
ERROR: failed to open pager: Cannot run program "less": error=2, No such file 
or directory
 Catalog 
---------
 pulsar  
 system  
(2 rows)

Query 20190219_075226_00012_53qk3, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]```
----
2019-02-19 07:53:35 UTC - Khoa Tran: I can see the coordinating node as 
available
```presto&gt; SELECT * FROM system.runtime.nodes;
ERROR: failed to open pager: Cannot run program "less": error=2, No such file 
or directory
 node_id |          http_uri          | node_version | coordinator | state  
---------+----------------------------+--------------+-------------+--------
 1       | <http://192.168.224.12:8081> | testversion  | true        | active 
(1 row)```
----
2019-02-19 08:29:42 UTC - Jacob O'Farrell: @Ali Ahmed Are you meaning to say 
that the presto cluster isn't properly talking to the Pulsar cluster?
----
2019-02-19 08:31:04 UTC - Ali Ahmed: yes could be a resource issue
----
2019-02-19 08:53:28 UTC - Jacob O'Farrell: In a K8s deployment, should the 
`zookeeper` deplyoment expose port 2181 in its service? I notice that there is 
mention of this port in some of the bare metal configurations etc, and I also 
noticed that in the aws configuration it is exposed 
(<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/aws/zookeeper.yaml#L148>),
 but in the GKE and Generic configurations, it is not 
(<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/google-kubernetes-engine/zookeeper.yaml#L157>)
----

Slack digest for #general - 2019-02-19

Reply via email to