2019-05-16 09:47:18 UTC - jia zhai: @Shivji Kumar Jha Here is the link for S3
config:
<https://pulsar.apache.org/docs/en/cookbooks-tiered-storage/#configuring-the-offload-driver>
seems role is not supported yet.
----
2019-05-16 09:49:34 UTC - jia zhai: @eric.olympe Is there a the detailed
requirement for “fanout” pattern?
----
2019-05-16 09:57:44 UTC - Shivji Kumar Jha: @jia zhai the documentation does
not say explicitly so i assumed it does… keys are actually considered bad
security practice so we prefer roles.. Do you know if this is a pulsar
limitation or jcloud library that pulsar uses for offloading?
----
2019-05-16 10:18:06 UTC - jia zhai: right, currently we leverage jclouds, and
following jclouds way
----
2019-05-16 11:01:08 UTC - eric.olympe: @jia zhai I mean broadcast messages to
all consumers of a topic : all consumers receive the same messages, no load
balancing.
----
2019-05-16 11:26:51 UTC - jia zhai: @eric.olympe then you could create several
subscriptions, each with a consumer connected
----
2019-05-16 11:28:19 UTC - jia zhai: a topic could have different subscriptions,
each subscription will receive all the messages in the topic
----
2019-05-16 12:08:27 UTC - eric.olympe: @jia zhai Pulsar has push or pull mode ?
----
2019-05-16 12:21:03 UTC - jia zhai: @eric.olympe It is push + pull.
----
2019-05-16 12:23:09 UTC - eric.olympe: you mean push for producing, pull for
consuming ?
----
2019-05-16 12:24:50 UTC - jia zhai: once consumer is connected, it will send a
CommandFlow to broker. CommandFlow contains the message number that consumer
could cache in its queue; and broker acked this command by pushing a lot of
messages to consumer.
----
2019-05-16 12:26:19 UTC - jia zhai: But if currently, broker side not contains
a available message, once consumer called a consumer.receive(), it is waiting
pull a message from broker
----
2019-05-16 12:26:22 UTC - jia zhai: @eric.olympe
----
2019-05-16 12:29:40 UTC - jia zhai: So, if there is enough messages in broker
side, and consumer has a queue to receive and cache messages, The message is
pushed from broker to consumer.
----
2019-05-16 12:32:00 UTC - jia zhai: otherwise, consumer is pulling 1 message
from broker.
----
2019-05-16 12:34:48 UTC - eric.olympe: @jia zhai Thanks a lot.
----
2019-05-16 12:37:42 UTC - jia zhai: welcome
----
2019-05-16 12:38:00 UTC - Brian Doran: Question re: Avro, Is it possible to use
an avro.GenericRecord in a producer.send()? I am having an issue whereby having
debugged to a point, I am seeing in
`org.apache.pulsar.shade.org.apache.avro.reflect.ReflectData`
>private FieldAccessor[] createAccessorsFor(Schema schema) {
> List<org.apache.pulsar.shade.org.apache.avro.Schema.Field>
avroFields = schema.getFields();
> FieldAccessor[] result = new FieldAccessor[avroFields.size()];
> org.apache.pulsar.shade.org.apache.avro.Schema.Field avroField;
> for(Iterator i$ = schema.getFields().iterator(); i$.hasNext();
result[avroField.pos()] = (FieldAccessor)this.byName.get(avroField.name())) {
> avroField =
(org.apache.pulsar.shade.org.apache.avro.Schema.Field)i$.next();
> }
> return result;
>}
even though the fields exist in the schema, I am getting a FieldAccessor[] of
null values.
----
2019-05-16 12:55:51 UTC - tuteng: You can refer to this code
<https://github.com/apache/pulsar/blob/78502a3cfae0d789cea667c4829830487517b7ea/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/SchemaBuilderTest.java>
----
2019-05-16 12:58:06 UTC - tuteng: It seems impossible to directly use
avro.GenericRecord. Pulsar encapsulates class avro.GenericRecord
----
2019-05-16 12:59:06 UTC - Brian Doran: That's what I thought @tuteng Thanks for
taking a look.
----
2019-05-16 13:08:18 UTC - Brian Doran: Actually @tuteng that is more
specifically for the generation of the Avro schema, this can be done with an
an avro schema via
>Schema.AVRO(
> SchemaDefinition.builder()
> .withJsonDef(schema)
> .build());
----
2019-05-16 14:58:58 UTC - Addison Higham: hrm... that's unfortunate... just
digging into the jcloud and they don't even wrap the AWS SDKs, where usually
you could just rely on the credential provider chain even if it wasn't plumbed
all the way through :confused:
----
2019-05-16 16:36:26 UTC - Byron: At the Kafka summit there was a key slide in
one of the keynotes. “overprovision or outage: pick one” the argument seems
moot in the Pulsar context based on my understanding. If storage is an issue,
more bookies are added. If CPU is an issue, more brokers are added. Throughput
is dependent on the topic structure and semantics (i.e. is it partitioned and
are there corresponding consumers of those partitions)
white_check_mark : Ali Ahmed, Joe Francis
----
2019-05-16 16:44:05 UTC - David Kjerrumgaard: @Byron I am not familiar with the
exact issue the presenter was addressing, but in Pulsar each layer (storage
& serving) can scale independently. This design decision was specifically
made to address the scenario you mention.
----
2019-05-16 16:45:27 UTC - Byron: Thanks. There wasn’t an issue per se, but
rather the ability to support elastic scaling. As you call out, the scaling
would happen at different layers.
----
2019-05-16 16:46:31 UTC - Byron: This has been a main attraction of Pulsar
----
2019-05-16 16:46:57 UTC - David Kjerrumgaard: @Byron The other key feature to
scalability is the stateless nature of the Pulsar Brokers and the Pulsar Proxy,
which allows newly added Broker nodes to serve old data.
----
2019-05-16 16:48:04 UTC - David Kjerrumgaard: In Kafka if you lose a serving
node, then only one of the 2 remaining replicas can serve data reads. A newly
added node to Kafka can't.
----
2019-05-16 16:48:46 UTC - Byron: Right
----
2019-05-16 16:53:08 UTC - Byron: I was an early Kafka “fanboy”, but after I
learned about its architecture I started questioning it from an operational
standpoint. Then I learned about Pulsar and the arch is much more well suited
for elastic changes.
----
2019-05-16 16:57:07 UTC - Byron: Whenever there is Kafka news and heavy
marketing, there is only a need to step back and re-evaluate what is true with
the tech. Unfortunately the marketing can win over more often than not. I am
not at all suggesting Kafka is not good software, but there are fundamental
arch decisions that are limiting for operators.
----
2019-05-16 17:02:55 UTC - Byron: In any case, thanks for entertaining my
question
----
2019-05-16 17:04:12 UTC - David Kjerrumgaard: @Byron We are in the same boat in
that regard. I did a lot of consulting for a big Kafka provider and implemented
a lot of streaming solutions around Kafka :smiley:
+1 : Byron
----
2019-05-16 17:07:42 UTC - David Kjerrumgaard: @Byron Have you considered
writing a blog post about some of your concerns with Kafka? We need all the
community support we can get :smiley:
----
2019-05-16 17:11:34 UTC - Byron: No, but I should
----
2019-05-16 17:13:26 UTC - Byron: This is a bit of an odd use case, but my
initial attraction to it was the “infinite storage” support with tiered storage
with a disabled retention policy for event sourcing use cases.
----
2019-05-16 17:15:11 UTC - Byron: Building an application layer around that to
support causal writes with a single produce to a given topic, for an
actor-based layer to parallelize writes for different entities across multiple
producers (and/or partitions).
----
2019-05-16 17:15:44 UTC - Byron: From there I have had the need for supporting
queues and actual streaming use cases like metrics from user actions on the
client side
----
2019-05-16 17:17:12 UTC - David Kjerrumgaard: Those requirements align well
with Pulsar.
----
2019-05-16 17:18:06 UTC - David Kjerrumgaard: If you need help during your
evaluation, you can always post your questions here! Great community of users
to help
----
2019-05-16 17:20:45 UTC - Byron: Thanks. I have posted here several times and
helped with the initial Go client (last year?). This was more of a reflection
kind of comment :wink:
----
2019-05-16 17:23:04 UTC - Byron: But I should write a blog post
+1 : Matteo Merli, Karthik Ramasamy
----
2019-05-16 17:28:38 UTC - David Kjerrumgaard: FWIW, we are working on an
improved Go client, due to increased demand. So stay tuned for that
----
2019-05-16 17:35:50 UTC - David Kjerrumgaard:
<https://github.com/apache/pulsar-client-go/pull/1>
100 : Byron
----
2019-05-16 17:39:38 UTC - Addison Higham: I imagine some streamlio people are
here... I filled out the form but haven't had anyone reach out, we are
evaluation phase of pulsar and would be curious to hear about the hosted
offering
----
2019-05-16 17:40:55 UTC - Addison Higham: @David Kjerrumgaard Not sure if you
can put me in touch with someone?
----
2019-05-16 17:42:51 UTC - Addison Higham: and the few other things I am getting
questions about from my org:
- a ruby client
- exactly-once flink consumer
- bulk reads of segments out of tiered storage (AFAIK, there is an API for
seeing where the segments are stored and relevant order information, but
curious what the longer term ideas around that are)
----
2019-05-16 17:43:02 UTC - Addison Higham: wrong addison (doesn't happen very
often!) but cool!
----
2019-05-16 17:43:31 UTC - David Kjerrumgaard: @Addison Higham We are finalizing
the service offering now and conducting some security testing, etc before we
roll it out to the general public.
----
2019-05-16 17:43:42 UTC - David Kjerrumgaard: Sorry, auto-complete error
:smiley:
----
2019-05-16 17:43:43 UTC - Jon Bock: Hi @Addison Higham, we e-mailed you a
couple of times last week but may those e-mails didn’t reach you. Can you DM
me the best address to use?
----
2019-05-16 17:44:23 UTC - Jon Bock: (Sorry Addison Bair, I also used the wrong
Addison).
----
2019-05-16 18:42:14 UTC - Kevin Brown: I have a question about the Pulsar
Functions API. Is there a way to publish to a topic and specify a key? I don’t
see a method in the Context interface that will accomplish this. Is there some
other way or workaround?
----
2019-05-16 18:54:27 UTC - Devin G. Bost: What do you mean by key? That could
mean a lot of things. What's your use case?
----
2019-05-16 18:56:29 UTC - Devin G. Bost: I keep getting:
`Reason: javax.ws.rs.ProcessingException: Connection refused:
localhost/127.0.0.1:8080`
when trying to connect to Pulsar. It's happening with the Pulsar docker
containers, and after I added the Kafka connectors to my local installation, I
started getting it with my local Kafka setup as well.
----
2019-05-16 18:56:33 UTC - Devin G. Bost: Any ideas?
----
2019-05-16 19:00:40 UTC - David Kjerrumgaard: @Devin G. Bost Assuming you have
done the basic diagnostics to ensure the process is up and listening on the
port, etc
----
2019-05-16 19:02:02 UTC - David Kjerrumgaard: @Devin G. Bost And you installed
them via these steps?
<http://pulsar.apache.org/docs/en/io-quickstart/#installing-builtin-connectors>
----
2019-05-16 19:10:01 UTC - Devin G. Bost: @David Kjerrumgaard The last command
never responds. (I edited to put the message in a snippet.)
----
2019-05-16 19:10:13 UTC - Devin G. Bost: That's what I'm getting.
----
2019-05-16 19:13:57 UTC - Devin G. Bost: That's using `docker run -it -p
6650:6650 -p 8080:8080 -p 2181:2181 -v $PWD/data:/pulsar/data -v
/data/provisioning:/data/provisioning apachepulsar/pulsar-all bin/pulsar
standalone`
----
2019-05-16 19:14:23 UTC - David Kjerrumgaard: I had to ask the obvious
first.... :smiley:
----
2019-05-16 19:14:27 UTC - Devin G. Bost: :slightly_smiling_face:
----
2019-05-16 19:14:44 UTC - David Kjerrumgaard: Which tag on the docker image?
latest?
----
2019-05-16 19:14:51 UTC - Devin G. Bost: I assume it's latest.
----
2019-05-16 19:14:54 UTC - Devin G. Bost: I could try 2.3.1
----
2019-05-16 19:15:21 UTC - David Kjerrumgaard: does telnet also hang when
connecting to port 8080?
----
2019-05-16 19:15:39 UTC - Devin G. Bost: I need to install telnet to check. One
moment.
----
2019-05-16 19:18:44 UTC - Devin G. Bost: @David Kjerrumgaard It's just hanging
at:
```
OCPC-LM31977:bin dbost$ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.```
and
```
OCPC-LM31977:bin dbost$ telnet localhost 6650
Trying ::1...
Connected to localhost.
Escape character is '^]'.
```
----
2019-05-16 19:19:53 UTC - David Kjerrumgaard: Telnet hangs for both ports?
----
2019-05-16 19:20:05 UTC - David Kjerrumgaard: or just 8080?
----
2019-05-16 19:21:29 UTC - Devin G. Bost: Both.
----
2019-05-16 19:23:50 UTC - David Kjerrumgaard: I noticed that both of the those
ports have connections stuck on CLOSE_WAIT status, which means there is unread
data left in the stream and the previous client didn't close the connection by
sending a FIN
----
2019-05-16 19:24:10 UTC - David Kjerrumgaard: not sure if that is blocking it
from establishing new connections or not
----
2019-05-16 19:25:59 UTC - David Kjerrumgaard: Did you hard kill a client that
was connected to the docker container?
----
2019-05-16 19:27:04 UTC - Devin G. Bost: Very good observation. I probably did.
----
2019-05-16 19:29:57 UTC - Devin G. Bost: Hmm, but I think I hard killed a
client that was connected to a previous container. I don't think I hard killed
a client with this one. If I did, I think it was already after I was getting
this error.
----
2019-05-16 19:32:10 UTC - David Kjerrumgaard: ok, can you see if the CLOSE_WAIT
connections are still in the netstat output?
----
2019-05-16 19:41:23 UTC - Devin G. Bost: Sure thing.
----
2019-05-16 19:42:24 UTC - Devin G. Bost: ```
OCPC-LM31977:bin dbost$ netstat -vanp tcp | grep 8080
tcp6 9 0 ::1.8080 ::1.64330 CLOSE_WAIT
407795 146808 29851 0 0x0122 0x0000010c
tcp6 0 0 ::1.64330 ::1.8080 FIN_WAIT_2
260992 146808 44876 0 0x2131 0x00000100
OCPC-LM31977:bin dbost$ netstat -vanp tcp | grep 6650
tcp6 0 0 ::1.6650 ::1.64338 CLOSE_WAIT
407795 146808 29851 0 0x0122 0x0000010c
tcp6 0 0 ::1.64338 ::1.6650 FIN_WAIT_2
260992 146808 44890 0 0x2131 0x00000100
OCPC-LM31977:bin dbost$ netstat -vanp tcp | grep 2181
tcp4 0 0 127.0.0.1.2181 127.0.0.1.64489 CLOSE_WAIT
408251 146988 29851 0 0x0122 0x0000010c
tcp4 0 0 127.0.0.1.64489 127.0.0.1.2181 FIN_WAIT_2
261312 146988 30490 0 0x2131 0x00000000
```
That's after I killed my docker container.
----
2019-05-16 19:49:17 UTC - Kevin Brown: Example
org.apache.pulsar.client.api.Producer has a method publish that publishes a key
along with the message.
Producer producer ....
producer.newMessage.key(“mykey”).value(“myvalue”).send()
In pulsar functions we use Context to publish which does not have a way to
specify key. See current API documentation below, it is not possible through
context.
Java Context class
<https://pulsar.apache.org/api/pulsar-functions/>
Python Context class
<http://pulsar.apache.org/api/python/functions/context.m.html>
----
2019-05-16 19:49:46 UTC - David Kjerrumgaard: very interesting
----
2019-05-16 19:50:03 UTC - Devin G. Bost: It gets more interesting. I'll post
more of what I found.
grinning : David Kjerrumgaard
----
2019-05-16 19:50:59 UTC - Devin G. Bost: ```
OCPC-LM31977:bin dbost$ lsof -t -i :6650
29851
OCPC-LM31977:bin dbost$ lsof -t -i :8080
29851
OCPC-LM31977:bin dbost$ lsof -t -i :2181
29851
OCPC-LM31977:bin dbost$ ps -ef | grep 29851
502 29851 29849 0 12:02PM ?? 0:03.12 com.docker.vpnkit --ethernet
fd:3 --port vpnkit.port.sock --port hyperkit://:62373/./vms/0 --diagnostics
fd:4 --pcap fd:5 --vsock-path vms/0/connect --host-names
host.docker.internal,docker.for.mac.host.internal,docker.for.mac.localhost
--gateway-names
gateway.docker.internal,docker.for.mac.gateway.internal,docker.for.mac.http.internal
--vm-names docker-for-desktop --listen-backlog 32 --mtu 1500
--allowed-bind-addresses 0.0.0.0 --http /Users/dbost/Library/Group
Containers/group.com.docker/http_proxy.json --dhcp /Users/dbost/Library/Group
Containers/group.com.docker/dhcp.json --port-max-idle-time 300
--max-connections 2000 --gateway-ip 192.168.65.1 --host-ip 192.168.65.2
--lowest-ip 192.168.65.3 --highest-ip 192.168.65.254 --log-destination asl
--udpv4-forwards 123:127.0.0.1:55622 --gc-compact-interval 1800
502 29855 29851 0 12:02PM ?? 0:00.00 (uname)
502 46540 2358 0 1:50PM ttys000 0:00.00 grep 29851
```
----
2019-05-16 19:51:04 UTC - Devin G. Bost: Evil docker.
smile : David Kjerrumgaard
----
2019-05-16 19:52:14 UTC - David Kjerrumgaard: So Docker is holding on to those
connections and blocking new clients....
----
2019-05-16 19:52:35 UTC - David Kjerrumgaard: What happens if you do a `docker
rm $(docker ps -aq)` ?
----
2019-05-16 19:55:41 UTC - Devin G. Bost: `docker ps -aq`
gives me:
`Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the
docker daemon running?`
I needed to kill the process manually.
----
2019-05-16 19:56:18 UTC - Devin G. Bost: (Note that I killed the process before
running `docker ps -aq`, but when I ran `docker ps --all` before killing the
process, none of the docker containers were running.)
----
2019-05-16 19:58:34 UTC - David Kjerrumgaard: wow.
----
2019-05-16 19:58:45 UTC - Devin G. Bost: Yeah.
----
2019-05-16 19:58:54 UTC - David Kjerrumgaard: Let's try this again with a fresh
Docker daemon and container :smiley:
----
2019-05-16 19:59:29 UTC - Devin G. Bost: I'm afraid.
----
2019-05-16 19:59:32 UTC - Devin G. Bost: haha
----
2019-05-16 19:59:48 UTC - David Kjerrumgaard: fear leads to the dark side
----
2019-05-16 20:00:43 UTC - Devin G. Bost: haha good point.
I think I'm going to continue developing on my local Pulsar for a while because
I have some pressure for a deadline. I'll revisit this afterwards and message
you when I'm ready to continue.
----
2019-05-16 20:01:11 UTC - Devin G. Bost: If you don't mind.
----
2019-05-16 20:01:53 UTC - David Kjerrumgaard: sounds good.
----
2019-05-16 20:01:56 UTC - David Kjerrumgaard: good luck
----
2019-05-16 20:02:21 UTC - Devin G. Bost: Thanks.
----
2019-05-16 21:31:19 UTC - Sanjeev Kulkarni: 2.4 has a new method of publishing
from context that should support this use case
----
2019-05-16 23:12:26 UTC - Kevin Brown: @ Sanjeev do you know when 2.4 will be
released?
----
2019-05-17 00:00:34 UTC - Sanjeev Kulkarni: @Matteo Merli might know the answer
for that
----