2020-02-27 09:55:26 UTC - Rolf Arne Corneliussen: Update: the above setup 
should work. I downloaded the latest Streamnative Weekly Build (have used the 
2.5.0 release up to now), and then compaction worked as expected. Will raise a 
github bug.
----
2020-02-27 10:56:38 UTC - Thiago: @Thiago has joined the channel
----
2020-02-27 12:27:19 UTC - riconsch: @riconsch has joined the channel
----
2020-02-27 12:34:48 UTC - riconsch: Hello everyone :slightly_smiling_face:
I have the following question/topic. I do want to have the following job being 
done by pulsar instead of kafka. My data process looks like this (at the moment 
with one kafka broker due to licensing confluent restrictions):
json data => serialized with avro => taking schema from schema registry 
=> producing in kafka topic => stored via HDFS 3 Sink => Hive to query 
data (metastore updated via schema information)

Is this workflow possible for pulsar to execute like kafka does in my process? 
We are evaluting to switch from kafka to pulsar. Any thoughs / help would be 
greatly appreciated due to the fact that I am pulsar novice.

Best,
riconsch
----
2020-02-27 15:13:34 UTC - Sijie Guo: :+1:
----
2020-02-27 15:15:46 UTC - Sijie Guo: @riconsch yes that’s doable. Pulsar 
provides the builtin schema management and connectors to achieve the same 
datapipeline as you described 
muscle : Konstantinos Papalias
----
2020-02-27 15:36:44 UTC - Pedro Cardoso: Hello,
Does Pulsar support incremental cooperative rebalancing akin to kakfa? See 
<https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka/>
----
2020-02-27 16:05:52 UTC - eilonk: Hi :slightly_smiling_face: I'm using helm to 
deploy pulsar on a kubernetes cluster. I can't find any documentation regarding 
helm and pulsar with tls enabled. many values, mostly in configmaps, require 
file paths (like tlsTrustCertsFilePath=/opt/pulsar/ca.cert.pem f.e)
On the other hand, the chart requires a tls-secret which i did configure, but 
it's not possible to refrence a secret in a configmap. how can i workaround 
these values? has anyone encountered this problem? Thanks!
----
2020-02-27 16:21:39 UTC - John Duffie: we figured out the problem.  If the 
SpecificRecord’s class is passed to the producer, then the schema is generated 
on the fly from the fields in the class.  What we needed was for the schema 
string to be the content of the SpecificRecord’s pregenerated schema (eg. 
SpecificRecordBase.getSchema())

So:
 When you construct the producer, instead of doing this:
```newProducer(Schema.AVRO(MyAvroClassThatExtendsSpecificRecord.class))```
you need to do this:
```newProducer(Schema.AVRO(schemaDefBuilder.build()))```
Where schemaDefBuilder is of the form:
``` SchemaDefinitionBuilderImpl&lt;MyAvroClassThatExtendsSpecificRecord&gt; 
schemaDefBuilder = new SchemaDefinitionBuilderImpl&lt;&gt;();
        
schemaDefBuilder.withJsonDef(MyAvroClassThatExtendsSpecificRecord.getClassSchema().toString());```
+1 : Konstantinos Papalias, Raman Gupta
----
2020-02-27 16:43:08 UTC - matt_innerspace.io: if it's a function, subscribing 
to thousands of topics via regex, and it's configured for parallelism, would 
that help?'
----
2020-02-27 16:49:05 UTC - Chris DiGiovanni: I have a pulsar environment where 
someone created a namespace where the name contains a "/" in it.  I cannot 
figure out a way now to delete this namespace.  I've tried using the 
pulsar-admin utility which complains as well as hitting the pulsar admin API 
directly w/o any luck.  I've also tried urlencoding the namespace name and that 
also does not work.  Any help would be appreciated.
----
2020-02-27 16:54:28 UTC - Chris Bartholomew: Hi @eilonk, we have a Pulsar Helm 
chart that supports TLS configuration. You can find it here: 
<https://helm.kafkaesque.io>. If you have trouble with it, let me know.
----
2020-02-27 17:26:18 UTC - Ryan Slominski: Can you have multiple schema 
associated with a single topic?  For example addUser schema and deleteUser 
schema on a "user" topic.
----
2020-02-27 17:28:28 UTC - John Duffie: we have liberty of defining new schema 
for our particular flow - so, we have an AVRO that supports union of embedded 
event objects
----
2020-02-27 17:28:57 UTC - John Duffie: that’s how we got around the problem you 
list about 1 topic with mult schema
----
2020-02-27 17:29:55 UTC - Meyappan Ramasamy: @Meyappan Ramasamy has joined the 
channel
----
2020-02-27 17:30:02 UTC - Joshua Dunham: Hi @John Duffie, I was *just* going to 
ask about nested schemas (complex types) in pulsar!
----
2020-02-27 17:30:26 UTC - Joshua Dunham: How do you manage them outside of 
pulsar?
----
2020-02-27 17:31:37 UTC - Joshua Dunham: I was looking into a service that can 
sync to something like HWX Schema Registry for a UI (short of support in 
pulsar-manager).
----
2020-02-27 17:34:14 UTC - Meyappan Ramasamy: hi team, this is Meyappan here. i 
am running pulsar in docker container in windows. i find pulsar supported only 
for apple mac and linux OS. I was unable to do a pip install 
pulsar-client==2.5.0 to get the pulsar python client installed. i want to know 
how I can access the pulsar CLI commands like pulsar-admin and pulsar-client 
when running pulsar as a docker image in docker container for windows. or can i 
access pulsar CLI commands only when pulsar is installed in apple mac or linux 
OS ?
----
2020-02-27 17:35:59 UTC - Tanner Nilsson: I would think you could docker exec 
into your docker container and run `bin/pulsar-admin` or `bin/pulsar-client` 
from there? (haven't been working with it on windows)
----
2020-02-27 17:49:07 UTC - John Duffie: Could you elaborate on your management 
question.  Is goal to ensure backups of the schema or use of schema in non 
pulsar flows like http/grpc?

Regarding the latter, years back, I wrote my registry and support library that 
was decoupled from the broker technology - transports included http, kafka, jmx 
:slightly_smiling_face: .
The approach used a message consisting of a common record with fields 
containing the schema ver and bytes[] . of the inner, serialized AVRO.  Allowed 
common solution to work over diff technologies.

Now, I want to find an off the shelf solution for a very specific use case.
----
2020-02-27 18:16:01 UTC - Joshua Dunham: Definitely the latter, but my goal is 
really to make something that is very human focused.
----
2020-02-27 18:16:24 UTC - Joshua Dunham: Unfortunately the inertia is towards 
siloed apps and siloed schemas.
----
2020-02-27 18:16:46 UTC - Joshua Dunham: I want a schema manager that (supports 
multiple transports) but also acts as a hub to get things cleaned up.
----
2020-02-27 18:34:29 UTC - John Duffie: I so wish there was an off the shelf 
solution.  I do not like the tight coupling between the broker technology and 
the schema registry that is becoming widespread so that things are “easier”
----
2020-02-27 18:36:15 UTC - John Duffie: I don’t know of a FOSS solution.  I’m 
looking at the Hortonworks one you mention.  Current project is not based on 
HDP.  Is that a requirement ?
----
2020-02-27 18:41:49 UTC - Ryan Slominski: Is having both a changes 
(add/remove/update) entity topic and a snapshot (entire table in a message) 
topic a good strategy?   Some consumers want just what has changed, others want 
entire list of all entities.  Single producer.   Some consumers might benefit 
from both to speed up replay - might try a "change ID" for consumers who want 
both and correlate change messages to snapshot messages.   Seems like Event 
Sourcing and CQRS are a minefield.
----
2020-02-27 18:43:10 UTC - John Duffie: the update case is the one that has me 
perplexed
----
2020-02-27 18:43:52 UTC - Ryan Slominski: update as in someone changes one of 
the fields like lastname in a user entity record might change
----
2020-02-27 18:45:15 UTC - John Duffie: luckily, my use case is simpler.  99.99% 
of traffic is time series IOT data.  updates only occur for some metadata
----
2020-02-27 18:45:53 UTC - John Duffie: yes
----
2020-02-27 18:47:18 UTC - Ryan Slominski: Yeah, I'm attempting to store 
metadata in Pulsar too.  My primary data is simple event data (notifications).
----
2020-02-27 18:47:40 UTC - John Duffie: sending entire object instead of just 
changed fields is not a deal breaker for me
----
2020-02-27 18:48:45 UTC - Ryan Slominski: Update isn't a deal breaker - I could 
just support add and remove and when something is "updated", just remove and 
add new
----
2020-02-27 18:51:01 UTC - Ryan Slominski: Question is - is having effectively a 
materialized view stored in Pulsar in a separate topic a good strategy?   If 
stored outside in RDMS or something it might be easier to query, but requires 
two different interfaces for clients.
----
2020-02-27 19:02:06 UTC - Joshua Dunham: Not at all
----
2020-02-27 19:02:10 UTC - Joshua Dunham: It's all API driven
----
2020-02-27 19:02:29 UTC - Joshua Dunham: feasible to pull the code, add in a 
module that syncs to Pulsar at it's API endpoint.
----
2020-02-27 19:06:34 UTC - John Duffie: want a snapshot in time of the state of 
the table ?
----
2020-02-27 19:09:12 UTC - Ryan Slominski: yeah, instead of storing table in a 
RDMS and instead of having to replay the change log topic to reconstruct state, 
I was thinking of having an additional topic published by same producer as the 
change topic that was a snapshot of full table
----
2020-02-27 19:10:41 UTC - John Duffie: If goal is to have a way to track table 
changes, then it sounds ok.

But if goal is to avoid having 2 diff clients - 1 to pulsar and 1 to RDMS, then 
I’m more skeptical about the benefit
----
2020-02-27 19:12:52 UTC - Ryan Slominski: Often same client needs not only 
record changes, but when first connecting /re-connecting must obtain full table 
of records.
----
2020-02-27 19:13:53 UTC - John Duffie: ok, that is interesting use case if 
there is some form of local cache that needs to be warmed up
----
2020-02-27 19:16:26 UTC - Ryan Slominski: yeah, kind of a shared config file 
that has pub/sub notifications when changes occur.
----
2020-02-27 19:26:28 UTC - Tobias Macey: @Tobias Macey has joined the channel
----
2020-02-27 19:33:55 UTC - Tobias Macey: This may belong in 
<#CJLPGAZBM|dev-ruby> but I'm working on deploying a Pulsar cluster with the 
intent of using it as a buffer for log messages coming from FluentD agents, 
which will then have those topics subscribed to by a pair of FluentD 
aggregators. Unfortunately, the lack of Ruby support in terms of Pulsar clients 
makes this a bit challenging. Would it be viable to use one of the Kafka API 
compatibility layers to allow for FluentD to treat Pulsar as a Kafka broker, or 
are those abstractions only available for Java clients that can swap out the 
class object?
----
2020-02-27 19:35:45 UTC - Tobias Macey: Alternatively, I'm looking at using the 
Netty IO client for publishing from FluentD or something like 
<https://github.com/kafkaesque-io/pulsar-beam/>
----
2020-02-27 19:36:03 UTC - Tobias Macey: Has anyone else gone down this path and 
have any advice?
----
2020-02-27 21:06:55 UTC - Ryan Slominski: Looks like a bug in documentation - 
clicking next button at bottom of this page 
<https://pulsar.apache.org/docs/en/functions-deploy/> results in 404
----
2020-02-27 21:10:32 UTC - Ryan Slominski: Do Pulsar Schema work with Pulsar 
Functions?  Also, what happens if I have two input topics into a Pulsar 
function with different schema?
----
2020-02-27 21:26:37 UTC - Joe Francis: This is a solution to a problem. Unless 
the same problem exists in Pulsar,  I am not clear on why Pulsar requires  
something like this.
----
2020-02-27 21:29:03 UTC - Pedro Cardoso: I am simply looking to understand 
whether a user can written custom logic for partition assignment based within 
pulsar and perform rebalancing logic in a gradual way (ideally without 
partition-consumption downtime)
----
2020-02-27 21:29:15 UTC - Joe Francis: Consumption side scaling is automatic in 
Pulsar. Use a shared consumer, and scale up the number of consumers up and down 
as needed.
----
2020-02-27 21:31:22 UTC - Pedro Cardoso: Is there documentation on how this 
consumption side scaling is performed? Can I introduce custom logic into it?
----
2020-02-27 21:32:07 UTC - Pedro Cardoso: Do cluster rebalancements imply 
downtime or can consumers continue to consume?
----
2020-02-27 21:32:09 UTC - Joe Francis: None of that repartitioning/regrouping 
required. You will need to kind of get cured, if you move from K t o Pulsar.
----
2020-02-27 21:33:03 UTC - Joe Francis: Its a radically new, simpler way of 
doing things. None of that concepts in K world apply
----
2020-02-27 21:34:36 UTC - Pedro Cardoso: Clearly there is some information I am 
missing,  I will need to investigate some more, thank you for your time.
----
2020-02-27 21:36:18 UTC - Joe Francis: 
<https://pulsar.apache.org/docs/v1.19.0-incubating/getting-started/ConceptsAndArchitecture/>
----
2020-02-27 21:36:59 UTC - Joe Francis: 
<https://streaml.io/blog/pulsar-segment-based-architecture>
----
2020-02-27 21:37:18 UTC - Ryan Slominski: Created pull request for docs 
issue...  <https://github.com/apache/pulsar/pull/6434>
----
2020-02-27 21:43:13 UTC - Joe Francis: Simply stated, if the load on your topic 
increases, you spin up a few more instances of your consumer, to scale up 
consumption rate, and shut them down when its done. It sounds like snake-oil, 
but that is how it works.
----
2020-02-27 21:46:36 UTC - Pedro Cardoso: My particular use-case is for stateful 
consumers, simply spinning up more instances is not enough. Think stream 
processing in x nodes, need to scale up to x2 nodes, state must be copied over, 
ideally you would want new nodes to be physically co-located.
----
2020-02-27 21:47:36 UTC - Pedro Cardoso: Perhaps the stream is partioned in 
some way, you want rebalancing for a specific partition
----
2020-02-28 00:31:25 UTC - Eugen: When consuming like `Message&lt;byte[]&gt; msg 
= consumer.receive()`, is there a way to get the partition of the `msg`?
----
2020-02-28 03:30:09 UTC - Sijie Guo: Yes. You can get the topic name and use 
TopicPartition class to get the partition id 
----
2020-02-28 03:46:15 UTC - Eugen: Thanks! I was under the impression that 
`message.getTopicName()` returns the name as given by the producer, rather than 
the internal name,  which looks like 
`topic:<persistent://public/default/topic-0-partition-4>`
----
2020-02-28 05:20:22 UTC - Sijie Guo: no it doesn’t help based on current 
implementatioon
----
2020-02-28 07:20:47 UTC - Sijie Guo: the kafka API compatibility layer is a 
Java API wrapper. it is not the protocol compatibility layer. we have developed 
a kafka protocol handler for pulsar. it will be open sourced in one week or so.
----
2020-02-28 07:21:31 UTC - Sijie Guo: @Tobias Gustafsson you can use websocket 
interface to publish to pulsar.
----
2020-02-28 07:40:41 UTC - Sijie Guo: &gt; but it’s not possible to refrence a 
secret in a configmap
@eilonk you can mount the secret to the file path that you specify in 
configmaps.
----
2020-02-28 07:44:48 UTC - Sijie Guo: @Pedro Cardoso - for autoscaled stream 
processing, you can use key range consumers/readers. You don’t need to 
rebalance partitions.

You can divide the key range into multiple key ranges. Then assign key ranges 
to your stateful processors. You open a consumer/reader to read the events for 
a key range and persisted the state by key ranges.

When you scaled up your stream processing, you can copy the key/range state to 
your process node and then resume reading events from that key/range.

You can also do key/range split if a range becomes a bottleneck.

In this way, you controlled the scale-up for your state but you don’t need to 
do any partition rebalance.
----
2020-02-28 07:45:14 UTC - Sijie Guo: you can also scale up the stateful 
processing beyond the number of partitions.
----
2020-02-28 08:13:59 UTC - Manuel Mueller: I would support Tanner.
1. run `docker ps` while pulsar is up
2. find either the ID or the name
3. use `docker exec -it [yourID/name] bash` to get into the container and be 
able to run pulsar-admin commands
----
2020-02-28 09:04:13 UTC - Rolf Arne Corneliussen: Have you considered topic 
compaction, to reduce the time used to replay the change log?
----

Reply via email to