2019-02-18 09:16:22 UTC - Marc Le Labourier: @Marc Le Labourier has joined the channel ---- 2019-02-18 10:07:43 UTC - Christophe Bornet: Hi. When using a "Global namespace" with geo-replication, can we use the global ZooKeeper for all the clusters ? Or is it recommended to have a local ZooKeeper cluster for each Pulsar cluster ? ---- 2019-02-18 10:51:19 UTC - bossbaby: I'm having an authentication problem in pulsar. When using tls, pulsar-client-cpp will be reconnect when there are a large number of messages sent Is this a bug? ---- 2019-02-18 14:09:10 UTC - Sijie Guo: “global zookeeper” here is just a configuration store. it technically is not required.
although you can still use global zookeeper for all clusters with proper chroot. however in this case, global zookeeper can be a failure point to all clusters, it is not recommended. ---- 2019-02-18 14:10:09 UTC - Sijie Guo: can you describe the setup on how this happened, so it can help the community to understand the problem and reproduce it if possible. ---- 2019-02-18 14:27:23 UTC - Laurent Chriqui: So I tried again this morning. Here’s what I’m to create the function : ---- 2019-02-18 14:27:37 UTC - Laurent Chriqui: Here are the logs I get : ---- 2019-02-18 14:28:03 UTC - Laurent Chriqui: ---- 2019-02-18 14:30:21 UTC - Laurent Chriqui: When I look at the code in utils.py I see that only the directory of the file is is added in the sys.path (here: ‘/tmp/pulsar_functions/public/default/RoutingFunction/0’ instead of ‘/tmp/pulsar_functions/public/default/RoutingFunction/0/pulsarfunction.zip’) ---- 2019-02-18 16:04:37 UTC - Laurent Chriqui: @Ali Ahmed do you see what the problem might be ? ---- 2019-02-18 16:27:16 UTC - Christophe Bornet: > it technically is not required. Do you mean that each cluster can use its own local ZK for config store ? ---- 2019-02-18 18:42:49 UTC - Boyan: @Boyan has joined the channel ---- 2019-02-18 18:46:47 UTC - Boyan: Hey guys, I stumbled upon the single DNS part on ( <https://pulsar.apache.org/docs/latest/deployment/cluster/> ). I'm a bit perplexed. Is the suggested solution really DNS load balancing or am I interpreting what the document says incorrectly? ---- 2019-02-18 18:51:00 UTC - Boyan: To be clear, I'm a bit uncertain on how the pulsar load balancing happens. Especially in comparison to kafka or nats. ---- 2019-02-18 19:59:46 UTC - Matteo Merli: The DNS is just for service discovery ---- 2019-02-18 20:00:54 UTC - Matteo Merli: For load balancing info, check out <https://pulsar.apache.org/docs/en/administration-load-distribution/> ---- 2019-02-18 20:07:30 UTC - Boyan: Then I'm even more confused. The line in the docs says: "A single DNS name covering all of the Pulsar broker hosts". I see that in the example the zk hosts are separate names, and the claim of internal load balancing on the link you mention(which is what I'd expect) means that I'm really missing the part of "why does it matter if the dns points to 1 or to all nodes?" ---- 2019-02-18 20:07:34 UTC - Boyan: (dns failover excluded) ---- 2019-02-18 20:08:00 UTC - Boyan: Is it an internal detail, that nodes need to be assigned the same dns when the reverse to be considered part of the same cluster or is it somethign else that I am obvioously missing? ---- 2019-02-18 20:09:41 UTC - Boyan: (I may have overstated, I'm definitely not more confused, the docs for load balancing are excellent. It's just the service discovery part that I'm strugglign with right now) ---- 2019-02-18 22:00:28 UTC - Jacob O'Farrell: Hi, Are there any examples of running Pulsar SQL (and the associated Presto parts etc) on a Kubernetes cluster? Any suggestions as to the best place to start looking? ---- 2019-02-18 22:02:47 UTC - Ali Ahmed: @Jacob O'Farrell do yo have pulsar running in k8 already ? ---- 2019-02-18 22:03:01 UTC - Jacob O'Farrell: Yep - I have Pulsar running on k8s ---- 2019-02-18 22:03:28 UTC - Jacob O'Farrell: Most of the docs around Pulsar SQL seem to be geared more towards non-k8s deployments (or I'm looking in the wrong place / misunderstanding) ---- 2019-02-18 22:06:14 UTC - Ali Ahmed: you are right we are lacking some documentation in that area ---- 2019-02-18 22:06:44 UTC - Ali Ahmed: but it should be straight forward presto is already part of the image so you you just need another service ---- 2019-02-18 22:12:11 UTC - Jacob O'Farrell: Okay so if I've understood correctly, I should be able to create a new deplyoment/service based off the pulsar image, and just change the command to start to `bin/pulsar sql-worker run` (e.g. changing from bin/pulsar broker) ---- 2019-02-18 22:14:35 UTC - Jacob O'Farrell: Is there a config etc that I need to modify / create? ---- 2019-02-18 22:19:28 UTC - Ali Ahmed: depends on the cluster setup but details are here <https://pulsar.apache.org/docs/en/sql-deployment-configurations/> ---- 2019-02-18 22:33:35 UTC - Jacob O'Farrell: Brokers generate/expose their config through this method (I believe) - any recommendations for how to achieve something similar with the sql-workers? <https://gyazo.com/90b22af3748678d6e6bf5aa00b76fd61> ---- 2019-02-18 22:40:01 UTC - Ali Ahmed: it should work in a similar way ---- 2019-02-18 22:40:39 UTC - Ali Ahmed: you would something like this ```bin/apply-config-from-env.py conf/presto/catalog/pulsar.properties ``` ---- 2019-02-18 22:43:52 UTC - Chris Martis: @Chris Martis has joined the channel ---- 2019-02-18 22:43:59 UTC - Jacob O'Farrell: thanks @Ali Ahmed! @Sijie Guo you mentioned yesterday that in 2.3 the kubernetes deployments/workers were getting some love - any changes on the horizon for the SQL related workers as well? (Additionally - any insight/additions as to the best way to get this up and going?) ---- 2019-02-18 22:50:15 UTC - Alexandre DUVAL: @Alexandre DUVAL has joined the channel ---- 2019-02-18 23:03:34 UTC - Alexandre DUVAL: Hi, I think it's the right place to ask this kind of question (I read a lot of things about pulsar, but I still have a question). What is the best solution? Have a lot of topics (so a lot of producers/consumers because the topic_id to pub/sub is defined on the producer/consumer and not on the record)? Or a huge topic with a lot of filters? Consider this phrase, I'm maybe wrong. I explain myself, the pulsar client is up when we create the producers/consumers, right? So maybe it is not a problem to have let's say 10 000 producers and foreach of them, 10msg/s. Let's say I need pulsar to put logs from my applications and I have a lot of applications with a lot of logs, the best way is to have one topic per application? Maybe filtering by messages' properties is a good way. I need your advices. I'm possibly globally wrong. So please do not hesitate to give me your opinion :slightly_smiling_face:. ---- 2019-02-18 23:06:27 UTC - Alexandre DUVAL: + we can only read logs from topics without consume them. Or maybe a topic (or parititon?) to consome and one to read. So I'll have livelogs by consuming and all the logs by reading? ---- 2019-02-18 23:29:22 UTC - Jacob O'Farrell: Apologies for the noise! I'm finding that in our K8s pulsar setup, we don't have any of the built in connectors available. How would I go about installing these? Should they be installed on the brokers? I'm trying to make sense of <http://pulsar.apache.org/docs/en/io-quickstart/#installing-builtin-connectors> in the context of a K8s installation and struggling a bit. Happy to do my best to contribute back any changes to the docs that I can make off the back of the great help I've received in this channel. ---- 2019-02-18 23:30:15 UTC - Ali Ahmed: @Jacob O'Farrell are you using master or 2.2.1 ? ---- 2019-02-18 23:32:23 UTC - Jacob O'Farrell: Our deployments are set to use the `apachepulsar/pulsar:latest` image tag in their deployments (as per the examples) - is this correct? ---- 2019-02-18 23:35:32 UTC - Jacob O'Farrell: Sorry!! Just saw this section <https://gyazo.com/b127f4d5345000ff1aa9bfecda8d3376> ---- 2019-02-18 23:35:35 UTC - Jacob O'Farrell: Would this fix my issue? ---- 2019-02-18 23:37:26 UTC - Ali Ahmed: pulsar-all should do it ---- 2019-02-18 23:41:13 UTC - Jacob O'Farrell: Awesome. Thank you @Ali Ahmed! Really appreciate the help. Sorry for all the noise! ---- 2019-02-19 01:29:08 UTC - Sijie Guo: @Jacob O'Farrell I am not aware anyone has done that yet. but since all the startup scripts are there, should be pretty straightforward to add one. ---- 2019-02-19 01:32:52 UTC - Sijie Guo: It is recommended to have one single DNS or load balancer in front of pulsar brokers. so when you configure your clients to pulsar, you don’t need to configure a list of addresses of brokers. It doesn’t have to point to all. You can point to one or a few. it is just acting as the entrypoint for discovering more brokers. ---- 2019-02-19 01:33:13 UTC - Jacob O'Farrell: In terms of the settings listed in: <https://github.com/apache/pulsar/blob/master/conf/presto/config.properties> Do we need to set/change `node.id`, `node.environment` and `presto.version`- these seem to be set to placeholders/test values? Unsure what it is expecting here - sorry if this is a silly question ---- 2019-02-19 01:42:18 UTC - Sijie Guo: all these settings are kind of presto related. based on my knowledge so far, node.id is used as an UUID. so you can generate an uuid for a worker when you started it. `node.environment` is used for distinguishing your environment. you can configure it as how your organize your clusters. e.g. staging, production. `presto.version` is used for advertising what version of presto you are running. it is also informative. so you can set to the version you are running. ---- 2019-02-19 03:03:44 UTC - Jacob O'Farrell: ```node.id is used as an UUID. so you can generate an uuid for a worker when you started it.``` Will this be auto generated if not specified? <http://pulsar.apache.org/docs/en/next/sql-deployment-configurations/#deploying-to-a-3-node-cluster> does mention it as required in the config file so just trying to wrap my head around it all! Sorry ---- 2019-02-19 03:04:52 UTC - bossbaby: I installed according to the instructions <https://pulsar.apache.org/docs/en/security-tls-transport/> In pulsar-client-cpp: ``` pulsar::ClientConfiguration config; config.setUseTls(true); config.setTlsTrustCertsFilePath("/Users/pro/Desktop/Apache_Pulsar/apache-pulsar-2.2.1_only_authen/my-ca/certs/ca.cert.pem"); config.setTlsAllowInsecureConnection(false); Client client("<pulsar+ssl://localhost:6651/>", config); ``` And after 1 time produce & consuming continuous messages, I get Schedule reconnection message: ---- 2019-02-19 03:12:32 UTC - bossbaby: producer log: <https://gist.github.com/tuan6956/088eeb7cd6971453867fbe55571d9786> consumer log: <https://gist.github.com/tuan6956/ce69d9188c7ecb45e8d054603162d7f0> ---- 2019-02-19 03:14:16 UTC - bossbaby: The question is why pulsar does not keep connected? ---- 2019-02-19 03:31:11 UTC - Jacob O'Farrell: Any suggestions as to a good starting point for PULSAR_MEM settings for the SQL workers ? ---- 2019-02-19 03:46:31 UTC - Jacob O'Farrell: Answered my own question - found them here <https://github.com/apache/pulsar/blob/master/conf/presto/jvm.config> ---- 2019-02-19 05:05:11 UTC - Jacob O'Farrell: @Ali Ahmed @Sijie Guo We've got the config map setup, and have this defined in the args section ``` args: - > bin/apply-config-from-env.py conf/presto/catalog/pulsar.properties && bin/apply-config-from-env.py conf/presto/config.properties && bin/pulsar sql-worker run``` However we don't see it writing the config to the specified files. *BUT* if we exec into the pod and run the command, it writes to the files/runs as expect... Any thoughts? ---- 2019-02-19 05:07:17 UTC - Ali Ahmed: did you set the env variables correctly ? ---- 2019-02-19 05:10:36 UTC - Jacob O'Farrell: I believe so? (is there a way to tell?) If we exec into the container, we can see that `conf/presto/catalog/pulsar.properties` is left as default, however if we run `bin/apply-config-from-env.py conf/presto/catalog/pulsar.properties` I can see it correct reflect the environment variables of the container in the `pulsar.properties` file ---- 2019-02-19 05:12:20 UTC - Jacob O'Farrell: Very strange / not what I would expect ---- 2019-02-19 05:35:49 UTC - Jacob O'Farrell: Rather puzzled, if you have any suggestions I'm all ears ---- 2019-02-19 05:37:24 UTC - Ali Ahmed: can you show the whole yaml ---- 2019-02-19 05:48:29 UTC - Jacob O'Farrell: ---- 2019-02-19 05:52:38 UTC - tianyu.xing: @tianyu.xing has joined the channel ---- 2019-02-19 07:46:27 UTC - Khoa Tran: Hey all - trying to follow the steps in <https://pulsar.apache.org/docs/en/sql-getting-started/> however when I’m trying to query anything more than listing the catalogs I’m running into errors straight away, with the presto commands printing either ” ` Failed to get schemas from pulsar: Unexpected end of file from server` or ```presto> show tables in pulsar."public/default"; Query is gone (server restarted?)``` Any suggestions would be appreciated! ---- 2019-02-19 07:47:25 UTC - Khoa Tran: For reference, here are some of the error logs I’ve been able to retrieve from the coordinator (currently the only presto worker/node up) ---- 2019-02-19 07:48:17 UTC - Khoa Tran: ---- 2019-02-19 07:48:24 UTC - Ali Ahmed: @Khoa Tran are you running a standalone server ---- 2019-02-19 07:48:57 UTC - Khoa Tran: ---- 2019-02-19 07:49:48 UTC - Khoa Tran: @Ali Ahmed This is on a cluster configuration I believe - does Presto behave differently on standalone? ---- 2019-02-19 07:50:54 UTC - Ali Ahmed: I don’t know the health of the cluster ---- 2019-02-19 07:53:06 UTC - Khoa Tran: I don’t quite understand, however I can run `show catalogs` and have results returned e.g. ---- 2019-02-19 07:53:12 UTC - Khoa Tran: ```presto> show catalogs; ERROR: failed to open pager: Cannot run program "less": error=2, No such file or directory Catalog --------- pulsar system (2 rows) Query 20190219_075226_00012_53qk3, FINISHED, 1 node Splits: 19 total, 19 done (100.00%) 0:00 [0 rows, 0B] [0 rows/s, 0B/s]``` ---- 2019-02-19 07:53:35 UTC - Khoa Tran: I can see the coordinating node as available ```presto> SELECT * FROM system.runtime.nodes; ERROR: failed to open pager: Cannot run program "less": error=2, No such file or directory node_id | http_uri | node_version | coordinator | state ---------+----------------------------+--------------+-------------+-------- 1 | <http://192.168.224.12:8081> | testversion | true | active (1 row)``` ---- 2019-02-19 08:29:42 UTC - Jacob O'Farrell: @Ali Ahmed Are you meaning to say that the presto cluster isn't properly talking to the Pulsar cluster? ---- 2019-02-19 08:31:04 UTC - Ali Ahmed: yes could be a resource issue ---- 2019-02-19 08:53:28 UTC - Jacob O'Farrell: In a K8s deployment, should the `zookeeper` deplyoment expose port 2181 in its service? I notice that there is mention of this port in some of the bare metal configurations etc, and I also noticed that in the aws configuration it is exposed (<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/aws/zookeeper.yaml#L148>), but in the GKE and Generic configurations, it is not (<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/google-kubernetes-engine/zookeeper.yaml#L157>) ----
