2019-08-06 10:20:52 UTC - Richard Sherman: This was when creating a new 
Consumer via a pulsar-proxy. Restarting the proxy cleared the problem.
----
2019-08-06 13:30:39 UTC - Chris Bartholomew: @Addison Higham I built a Docker 
image for the broker that is 2.3.1 plus that PR. I have been using it for a 
while now. If you are using k8s, it would be easy to try it to see if it fixes 
your Datadog issues: ```broker:
    repository: kafkaesqueio/pulsar-all
    pullPolicy: IfNotPresent
    tag: 2.3.1_kesque_1
```
----
2019-08-06 13:36:54 UTC - Alexandre DUVAL: Sure, but it is not implemented in 
the pulsar-admin functions command, right?
----
2019-08-06 18:15:56 UTC - Zhenhao Li: hi there, has anyone used Pulsar SQL?
----
2019-08-06 18:16:44 UTC - Jerry Peng: Yes
----
2019-08-06 18:16:50 UTC - Zhenhao Li: I read this page 
<https://pulsar.apache.org/docs/en/sql-overview/> and have a few questions.
are Presto workers part of Pulsar?
----
2019-08-06 18:17:26 UTC - Zhenhao Li: and do you have performance benchmarking 
results?
----
2019-08-06 18:18:17 UTC - Jerry Peng: @Zhenhao Li Presto is packaged with 
pulsar for ease of use thus you can use the ./bin/pulsar to start a presto 
cluster.  Alternatively, you can also just deploy your own presto cluster using 
a presto distribution
----
2019-08-06 18:19:11 UTC - Zhenhao Li: I see. Is it possible to use Spark to 
query data in Pulsar via a Spark connector?
----
2019-08-06 18:19:38 UTC - Zhenhao Li: if so what is the difference from Presto?
----
2019-08-06 18:20:59 UTC - Jerry Peng: The Presto connector reads data directly 
from the storage layer to maximize throughput, while the spark connector, I 
believe reads data with consumer/reader API which is optimized for latency.
----
2019-08-06 18:22:01 UTC - Jerry Peng: While we can do a similar implementation 
for the spark connector to read directly from the storage layer.  We have yet 
to do that.
----
2019-08-06 18:22:25 UTC - Zhenhao Li: ok. thanks!
----
2019-08-06 18:22:55 UTC - Zhenhao Li: I actually have a real use case in mind.
how fast can it be to replay all historical events for a given key?
----
2019-08-06 18:23:29 UTC - Zhenhao Li: I am thinking of using Pulsar as the 
persistence layer for event sourcing
----
2019-08-06 18:25:17 UTC - Zhenhao Li: in Kafka there will be no bound because 
each partition can have growing number of keys
----
2019-08-06 18:25:43 UTC - Jerry Peng: Because Pulsar SQL (presto connector) 
reads data directly from the storage layer, it can read the data within topics 
in parallel from multiple replicas.  The more replicas you have of the data, 
the higher the potential read throughput
----
2019-08-06 18:25:56 UTC - Zhenhao Li: I wonder if Pulsar can provide a time 
bound on queries per key
----
2019-08-06 18:27:29 UTC - Jerry Peng: A hard time bound cannot be guaranteed 
but throughput will be much higher than query kafka using presto since data is 
read out from a consumer and limited by number of partitions in that case
----
2019-08-06 18:28:40 UTC - Zhenhao Li: I see
----
2019-08-06 18:29:50 UTC - Jerry Peng: Messages of a key will only be in a 
particular partition so if you know which partition the messages for a key 
resides, you can just query for that partition
----
2019-08-06 18:30:03 UTC - Jerry Peng: to minimize the amount of data you will 
need to filter
----
2019-08-06 18:30:46 UTC - Jerry Peng: There is not HARD time bound but the soft 
time bound is the query time is proportional to the number of messages in the 
topic
----
2019-08-06 18:31:39 UTC - Jerry Peng: Pulsar, like any big data system, is not 
a hard real-time system and thus does not have hard guarantee on completion time
----
2019-08-06 18:31:42 UTC - Zhenhao Li: it makes sense to use a hashing function 
to map keys to partitions. so both producers and query makers know the map
----
2019-08-06 18:33:46 UTC - Jerry Peng: Its also interesting to note that read 
throughput is not bounded on the number of partitions in Pulsar SQL. Users can 
simply increase the number of write replicas
----
2019-08-06 18:33:50 UTC - Zhenhao Li: but it would great if there are some 
benchmarking results, say against HBASE, Cassandra, etc
----
2019-08-06 18:35:00 UTC - Jerry Peng: Though Pulsar and Pulsar SQL has 
different use cases as compared to HBASE, Cassandra, which are nosql kv stores
----
2019-08-06 18:35:08 UTC - Zhenhao Li: sorry, do you mean read replicas?
----
2019-08-06 18:35:33 UTC - Jerry Peng: Pulsar SQL also works with tiered storage 
and is able to query data offload to object stores like S3
----
2019-08-06 18:40:15 UTC - Jerry Peng: In pulsar terminology, when message is 
published there is a write quorum and and ack quorum.  For example if my 
cluster is setup with a write quorum of 3 and an ack quorum of 2, this means 
for every message written to the storage layer nodes,  2 out of 3 acks needs to 
be received from the storage layer nodes before the message is deemed 
successfully published.  However, data can be read from all 3 replicas.  So 
when I say write replicas, I mean the write quorum
----
2019-08-06 18:43:58 UTC - Zhenhao Li: I know it is not a fair question. Sorry I 
was thinking in my context. The "problem" I face is basically fine-grained 
recovery in a event-sourcing or stream system.
Steam processing frameworks like Flink has only stream level replay on 
failures, which means the whole topic will be replayed from last checkpoint.
Akka Actors offers actor level recovery, but has to use something like 
Cassandra for the persistent layer
----
2019-08-06 18:45:40 UTC - Zhenhao Li: I see. so the data replicating happens in 
the background asynchronous?
----
2019-08-06 18:46:33 UTC - Zhenhao Li: to increase read throughput, I can set 
write quorum of 10 and ack quorum of 2?
----
2019-08-06 18:48:25 UTC - Zhenhao Li: here is a crazier idea. is it possible to 
have one partition per key and dynamically grow partitions? surely it is not 
possible with Kafka
----
2019-08-06 18:52:20 UTC - Ambud Sharma: thanks @David Kjerrumgaard
----
2019-08-06 20:03:07 UTC - Vahid Hashemian: @Vahid Hashemian has joined the 
channel
----
2019-08-06 20:09:40 UTC - Yu Yang: @Yu Yang has joined the channel
----
2019-08-06 21:22:08 UTC - Jerry Peng: replicating happens in the background
----
2019-08-06 21:27:33 UTC - Jerry Peng: &gt; to increase read throughput, I can 
set write quorum of 10 and ack quorum of 2?

correct
----
2019-08-06 21:27:51 UTC - Jerry Peng: data can also be stripped across multiple 
nodes
----
2019-08-06 21:28:06 UTC - Jerry Peng: in that configuration you wouldn’t even 
need to increase the write quorum
----
2019-08-06 21:29:07 UTC - Jerry Peng: Depends on how many keys.  Pulsar can 
have millions of topics
----
2019-08-06 21:29:15 UTC - Jerry Peng: partitions can be increased on the fly
----
2019-08-06 21:29:44 UTC - Jerry Peng: &gt; is it possible to have one partition 
per key.
Its possible but not a typical architecture
----
2019-08-06 22:07:14 UTC - Addison Higham: hrm... so the `/metrics` endpoint in 
the broker is not authenticated so any client can hit it, but in the proxy, it 
is authenticated, is that intended?
----
2019-08-06 23:30:32 UTC - Devin G. Bost: For a Pulsar function (like `public 
String process(String input, Context context) throws Exception { . . . `)
is there a way to skip a message?
There are certain cases for one of our functions that we want to ignore (and 
not send downstream).
----
2019-08-06 23:33:24 UTC - Victor Siu: I usually just return null from my 
function but I’m not sure if that’s the recommended way.
----
2019-08-06 23:42:10 UTC - Victor Li: Is there a document describing all topic 
level metrics in details? things like rate_in/out, throughput_in/out? I am not 
sure what they mean specifically. Thanks!
----
2019-08-06 23:45:48 UTC - Devin G. Bost: haha thanks Victor.
----
2019-08-07 02:36:24 UTC - Sijie Guo: for pulsar + spark integration, check out 
this spark connector: <https://github.com/streamnative/pulsar>

it support spark structured streaming and spark SQL.

@yijie also wrote a great blog post about it: 
<https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017>
----
2019-08-07 02:37:56 UTC - Sijie Guo: I am working on a page for it. a pull 
request will be out today
----
2019-08-07 02:39:01 UTC - Sijie Guo: I don’t think it is intended. I guess the 
authentication was unfortunately applied because the http endpoint in proxy are 
just forwarding requests to broker.
----
2019-08-07 02:39:43 UTC - Sijie Guo: @Chris Bartholomew can you rebase that 
pull request? I will pick up the review for it.
----
2019-08-07 02:47:40 UTC - yijie: <https://github.com/streamnative/pulsar-spark>
----
2019-08-07 03:08:24 UTC - Sijie Guo: (sorry I pasted the wrong link)
----
2019-08-07 03:22:41 UTC - Rui Fu: hi all, how can we set function runtime’s 
python version under localrun mode?
----
2019-08-07 03:23:52 UTC - Rui Fu: such as we have “python” -&gt; python 2.7 and 
“python3" -&gt; python 3.x,  we want python3 to run the function, but pulsar 
with localrun mode always use python
----
2019-08-07 03:27:07 UTC - Ali Ahmed: @Rui Fu you can use venv
```mkdir test
cd test
python3 -m venv env
source env/bin/activate
python -V
```
----
2019-08-07 03:27:49 UTC - Rui Fu: @Ali Ahmed i will try, thanks :wink:
----
2019-08-07 03:31:34 UTC - Ali Ahmed: you will also need ```pip install 
pulsar-client``` before you run the function in local mode
----
2019-08-07 06:30:16 UTC - divyasree: @Sijie Guo I am trying to enable 
authentication using TLS, in two cluster (each cluster is in different DC) and 
proxy , goe-replication is enabled this setup.
I am trying TLS, to make authenticationa and authorisation work in geo 
replication.
----
2019-08-07 06:30:35 UTC - divyasree: I have configured the instance as below
----
2019-08-07 06:31:05 UTC - divyasree: ``` Broker Configuration

brokerServicePortTls=6651

webServicePortTls=8443

tlsEnabled=true

tlsCertificateFilePath=/opt/apache-pulsar-2.3.2/broker.cert.pem

tlsKeyFilePath=/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

tlsTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem

tlsProtocols=TLSv1.2,TLSv1.1

tlsCiphers=TLS_DH_RSA_WITH_AES_256_GCM_SHA384,TLS_DH_RSA_WITH_AES_256_CBC_SHA

authenticationEnabled=true

authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls

brokerClientTlsEnabled=true

brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls

brokerClientAuthenticationParameters=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

brokerClientTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem



Proxy Configuration

brokerServiceURLTLS=<pulsar+ssl://pulsar.ttc.ole.prd.target.com:6651>

brokerWebServiceURLTLS=<https://pulsar.ttc.ole.prd.target.com:8443>



authenticationEnabled=true

authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls

brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls

brokerClientAuthenticationParameters=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

brokerClientTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem

tlsEnabledWithBroker=true



Client Configuration

webServiceUrl=<https://localhost:8443/>

brokerServiceUrl=<pulsar+ssl://localhost:6651/>

authPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls

authParams=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

tlsTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem

tlsAllowInsecureConnection=false. ```
----
2019-08-07 06:31:30 UTC - divyasree: broker is not getting started, with the 
above changes..
----
2019-08-07 06:31:46 UTC - divyasree: i dont see any error logs in the log file..
----
2019-08-07 06:32:17 UTC - divyasree: broker is getting stopped after few mins, 
i guess.
----
2019-08-07 06:32:25 UTC - divyasree: dnt know what i am missing here
----
2019-08-07 06:32:33 UTC - divyasree: Can you help me out on this?
----
2019-08-07 06:38:59 UTC - Sijie Guo: how did you start pulsar broker. in 
foreground or background (via pulsar-daemon)?
----
2019-08-07 06:45:27 UTC - divyasree: via pulsar-daemon
----
2019-08-07 06:46:47 UTC - divyasree: I have a doubt on broker.cert.pem file... 
When creating this file i have given common name as wildcard 
"pulsar-prd-*.<http://target.com|target.com>"
----
2019-08-07 06:47:47 UTC - divyasree: i have enabled proxy, which have dns name 
as "pulsar.ttc........."
----
2019-08-07 06:48:12 UTC - Sijie Guo: change conf/log4j.yaml to set 
immediateFlush to true. and try to start broker again and you will see the logs

```
    RollingFile:
      name: RollingFile
      fileName: "${sys:pulsar.log.dir}/${sys:pulsar.log.file}"
      filePattern: 
"${sys:pulsar.log.dir}/${sys:pulsar.log.file}-%d{MM-dd-yyyy}-%i.log.gz"
      immediateFlush: false
```
----
2019-08-07 06:48:46 UTC - divyasree: so do i need to create separage cert.pem 
file and configure broker with broker.cert.pem and  configure proxy with 
proxy.cert.pem?
----
2019-08-07 06:52:44 UTC - divyasree: i am getting error as Caused by: 
java.lang.IllegalArgumentException: unsupported cipher suite: 
TLS_DH_RSA_WITH_AES_256_GCM_SHA384(DH-RSA-AES256-GCM-SHA384)
----
2019-08-07 06:53:03 UTC - divyasree: meaning the cipher mentioned in the conf 
file is not supported right?
----
2019-08-07 06:53:28 UTC - divyasree: hope if i am giving blank, will support 
all types of ciphers
----
2019-08-07 07:05:51 UTC - Sijie Guo: correct you can remove the supported 
cipher and leave it blank
----
2019-08-07 07:12:20 UTC - divyasree: ok i works, making the broker to run..
----
2019-08-07 07:15:18 UTC - divyasree: but when producing messages through 
client, i am getting the below error ``` Search domain query failed. Original 
hostname: '<http://pulsar-prd-bk1.target.com|pulsar-prd-bk1.target.com>' failed 
to resolve 
'<http://pulsar-prd-bk1.target.com.stores.target.com|pulsar-prd-bk1.target.com.stores.target.com>'
 after 7 queries 
        at 
org.apache.pulsar.shade.io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:848)
 ```
----
2019-08-07 07:16:06 UTC - divyasree: 
"<http://stores.target.com|stores.target.com>" this is getting appended at the 
end of the hostname
----
2019-08-07 07:16:11 UTC - divyasree: what does it mean
----
2019-08-07 07:18:14 UTC - Kim Christian Gaarder: Q: In the Java client, will a 
call like “consumer.receive(0, TimeUnit.SECONDS)” always return a message if 
one is available in the topic, or will it time out if it cannot deliver the 
message immediately? i.e. I want to know if I can rely on this method to know 
for sure whether I have found the last message in the topic.
----
2019-08-07 07:18:31 UTC - Sijie Guo: I think it is attempts to query your DNS 
server. and your DNS server returns the  name 
‘<http://pulsar-prd-bk1.target.com.stores.target.com|pulsar-prd-bk1.target.com.stores.target.com>’
----
2019-08-07 07:18:46 UTC - Sijie Guo: how did you configure the DNS and LB?
----
2019-08-07 07:19:06 UTC - divyasree: we are having VM's in openstack
----
2019-08-07 07:19:19 UTC - divyasree: so i configure via openstack
----
2019-08-07 07:19:37 UTC - divyasree: add the brokers to the pool of the LB 
listener
----
2019-08-07 07:20:10 UTC - divyasree: but same configuration works with token 
authentication for me
----
2019-08-07 07:20:36 UTC - divyasree: i have configured TCP for token 
authentication with port 6650
----
2019-08-07 07:20:51 UTC - divyasree: do i need to change it to HTTPS with 6651?
----
2019-08-07 07:25:10 UTC - Sijie Guo: it returns if there is one message already 
prefetched in the consumer queue, it will return the message. However it 
doesn’t mean if it is at the end of the topic.

if you want to do so, use hasMessagesAvailable.
----
2019-08-07 07:31:07 UTC - Kim Christian Gaarder: There is no such method 
hasMessagesAvailable
----
2019-08-07 07:31:19 UTC - Kim Christian Gaarder: Not in the “Consumer” 
interface anyway
----
2019-08-07 07:34:41 UTC - Sijie Guo: I see. oh it is only available in `Reader` 
interface for now.  But it is available in Consumer though. because the 
fundamentals are the same. You can consider upcast to ConsumerImpl for now, we 
can look into expose it to consumer as well.
----
2019-08-07 07:34:58 UTC - Kim Christian Gaarder: ok, thanks 
:slightly_smiling_face:
----
2019-08-07 07:35:04 UTC - Kim Christian Gaarder: I will attempt the upcast
----
2019-08-07 07:36:19 UTC - Kim Christian Gaarder: My problem right now is that 
the only reliable way I have to determine whether I have reached the last 
message is by producing a marker message and consuming until I get that 
message, which is obviously not ideal as I have to write to the topic in order 
to reliably know whether I have read all messages.
----
2019-08-07 07:36:59 UTC - Sijie Guo: @divyasree hmm I am not sure whether it is 
related to the common name or not.
----
2019-08-07 07:37:07 UTC - Kim Christian Gaarder: but hasMessagesAvailable could 
potentially solve that problem!
+1 : Sijie Guo
----
2019-08-07 07:37:42 UTC - Sijie Guo: did you try connect with ip?
----
2019-08-07 07:40:05 UTC - divyasree: ok.. any suggestion on this part?
``` I have a doubt on broker.cert.pem file... When creating this file i have 
given common name as wildcard "pulsar-prd-*.<http://target.com|target.com>"

 so do i need to create separage cert.pem file and configure broker with 
broker.cert.pem and  configure proxy with proxy.cert.pem? 

And also i saw, in documentation mentioning the cert with "my-roles.cert.pem" 
who can we define the role in the pem

i am confused in this part ```
----
2019-08-07 07:41:09 UTC - divyasree: trying with IP, or the proxy name didnt 
work.. I guess, since i have given wildcard host name  with matches the broker 
hostname alone.. it didnt work
----
2019-08-07 07:41:44 UTC - divyasree: that y i am asking, do we need to generate 
separate cert for proxy
----
2019-08-07 07:42:07 UTC - divyasree: if so, which on should i use where? And 
wat to use in client?
----
2019-08-07 07:42:48 UTC - divyasree: And also, which key should i use to create 
authorisation token, and how to specify role in it..
----
2019-08-07 07:44:43 UTC - divyasree: Can you provide detailed docs on tls with 
proxy
----
2019-08-07 07:45:07 UTC - Sijie Guo: since you are configuring proxy to connect 
to brokers with tls cert, I would suggest you configure proxy certs:

- a client/server cert pair: one for configuring broker and for the 
`brokerClient` part in proxy.
- a client/server cert pair: one for configuring proxy and the client. because 
your client is talking to proxy
----
2019-08-07 08:12:56 UTC - Zhenhao Li: thanks!
----
2019-08-07 08:16:33 UTC - Zhenhao Li: what do you mean by "stripped across 
multiple nodes"?
----
2019-08-07 08:16:58 UTC - Zhenhao Li: isn't the write quorum across multiple 
nodes?
----
2019-08-07 08:32:35 UTC - Kim Christian Gaarder: Q: Assuming we don’t know any 
pulsar message-ids externally, is there any way to ask Pulsar for the content 
of the last message currently published on a topic without scanning through all 
message from the beginning?

Even if I can get the message-id of the last message in the topic, I have not 
found a way to read that message without scanning from the start (or by 
resuming from a previous subscription position). This is why the 
hasMessageAvailable method was helpful, but it still does not provide a good 
way to read the last message without having to first scan through all messages 
or to use a separate subscription to track this.
----
2019-08-07 08:34:28 UTC - boldrick: @boldrick has joined the channel
----
2019-08-07 08:34:48 UTC - Sijie Guo: I think there was a similar request before 
to return the last message (id). The mechanism is actually ready and being used 
for `hasMessageAvailable). @jia zhai was creating a issue for this new api 
request.
----
2019-08-07 08:35:12 UTC - Kim Christian Gaarder: As I understand it, if we knew 
the message-id of the message before the last message, then we could seek to 
that and then we would be reading the last message next.
----
2019-08-07 08:35:37 UTC - Kim Christian Gaarder: but seek will not allow you to 
read the message of the message-id that you seek to, only the next message.
----
2019-08-07 08:36:13 UTC - Matteo Merli: Take a look at 
`ReaderBuilder.startMessageIdInclusive(true)`
----
2019-08-07 08:36:44 UTC - Matteo Merli: creating a reader starting on 
`MessageId.Latest` inclusive
----
2019-08-07 08:36:57 UTC - Kim Christian Gaarder: ahh … that will solve it! 
thank you!
----
2019-08-07 08:37:55 UTC - Kim Christian Gaarder: @Matteo Merli can that only be 
done when creating a new reader, or is there a way to re-use an existing reader 
to repeat this operation ?
----
2019-08-07 08:39:02 UTC - Kim Christian Gaarder: I suppose in my use-case it’s 
good enough to only do this when creating a new reader, as I only need this for 
continuation after failure (or intentional stoppage) which is rare anyway.
----
2019-08-07 08:39:28 UTC - Matteo Merli: Yes, it’s done on the reader creation 
only
----
2019-08-07 08:39:51 UTC - boldrick: hi guys, I wanted to ask you if there is an 
easy way to build local docker image of pulsar. Found the Dockerfile that 
builds the C++ client but wanted to build pulsar-all and I thought it is the 
pulsar/docker/pulsar/Dockerfile but it wants to move all from /apache-pulsar 
but there isn't such directory in the project
----
2019-08-07 08:42:36 UTC - Kim Christian Gaarder: @Matteo Merli Actually the 
reader interface javadoc says: “This configuration option also applies for any 
cursor reset operation like Reader.seek(MessageId).” which means this will also 
apply to seek :slightly_smiling_face:
----
2019-08-07 08:46:49 UTC - Matteo Merli: Ok, then it should work 
:slightly_smiling_face:
----
2019-08-07 08:52:10 UTC - Kim Christian Gaarder: This is very good news, I felt 
like there was some functionality missing here in order to build some basic 
applications on top of Pulsar, this really does solve all of that!
+1 : Sijie Guo, Matteo Merli
----
2019-08-07 09:06:04 UTC - Sijie Guo: 
<https://github.com/apache/pulsar/pull/4910>
----
2019-08-07 09:06:07 UTC - Sijie Guo: here is the pull request
----
2019-08-07 09:08:18 UTC - Matteo Merli: @boldrick You can build the docker 
image (and everything that’s required) from scratch with `mvn install -Pdocker 
-DskipTests`
----

Reply via email to