Slack digest for #general - 2018-12-04

Apache Pulsar Slack Tue, 04 Dec 2018 01:11:37 -0800

2018-12-03 09:12:37 UTC - Ivan Kelly: could you give an example of what a table 
topic and domain topic would be?
----
2018-12-03 09:44:30 UTC - Olivier Chicha: Sure,
let say that in each of the table of our DB we have a filed named "domainName"
let say we have in our DataBase a Table named UserContact, then our table topic 
"table/UserContact" would receive an event ("propertyChangeEvent") for each 
change performed in the table (i.e. if we update 2 fields of a row of the table 
we generate 2 events)
Now let say that we have an enterprise Acme, then our topic "domain/Acme" would 
receive an event each time a row, for which domainName = acme, is modified
----
2018-12-03 10:43:48 UTC - Chris Miller: I've seen similar errors when the 
namespace doesn't exist. What does `pulsar-admin namespaces list diagnostics` 
return? If the namespace is there, what does `pulsar-admin namespaces policies 
diagnostics/local.diagnostics.guestnamespace` return?
----
2018-12-03 11:02:12 UTC - Christophe Bornet: Hi all, can Pulsar brokers be 
aware of the rack placement of Bookies to perform reads on the closest Bookie ?
----
2018-12-03 11:10:06 UTC - jia zhai: @Christophe Bornet This is not supported 
yet.
----
2018-12-03 11:11:07 UTC - Christophe Bornet: Does this mean you intend to 
support it someday ?
----
2018-12-03 12:31:07 UTC - Ivan Kelly: how are you generating events from the 
table? tailing the journal or something?
----
2018-12-03 12:48:13 UTC - Olivier Chicha: We have a kind of in house proxy that 
allows us to control all the change requests on the DB
----
2018-12-03 14:05:33 UTC - Bogdan BUNECI: Hi ! Just a small question: While 
trying to use json schema from 
<http://json-schema.org/draft-07/schema|json-schema.org/draft-07/schema>  we 
received invalid schema. What is the meaning of “type”: [“JSON”,“AVRO”] ?
----
2018-12-03 14:21:53 UTC - Ivan Kelly: Does this come with some sort of 
monotonically increasing number?
----
2018-12-03 14:22:38 UTC - Ivan Kelly: it seems to me like you should create two 
events for each change. the only issue is what happens in failure cases
----
2018-12-03 14:22:56 UTC - Ivan Kelly: I assume clients are consuming either a 
domain topic or a table topic, but not both?
----
2018-12-03 14:24:12 UTC - Ivan Kelly: AVRO is a type of schema. how are you 
passing in your json schema?
----
2018-12-03 14:27:15 UTC - Bogdan BUNECI: I’ve tested with a simple AVRO schema 
and is working. I’m trying to use JSON schema.
----
2018-12-03 14:27:35 UTC - Bogdan BUNECI: Schema is uploaded with pulsar-admin 
schemas upload …
----
2018-12-03 14:31:18 UTC - Bogdan BUNECI: one json per line
----
2018-12-03 14:56:44 UTC - Bogdan BUNECI: I guess I should test with some 
records not from the console :slightly_smiling_face:
----
2018-12-03 15:18:24 UTC - Bogdan BUNECI: working very well with AVRO. Records 
produced with Apache NiFi.
----
2018-12-03 15:18:56 UTC - Bogdan BUNECI: Thanks !
----
2018-12-03 15:24:27 UTC - Yifan: Hi, I am here with another general question:) 
my system is currently very light, I maybe processing only a few thousands 
message per topic per hour, I have probably 10-20 topics. Is there any 
suggested configuration for this setup? I don’t want to use more resources than 
needed. Currently I can see in values.yaml (deployment/kubernetes/helm), memory 
for zookeeper is 15G, for example. 4G for grafana, which to me is a lot. Are 
there general guidelines for memory and cpu configuration for pulsar cluster?
----
2018-12-03 16:31:16 UTC - 东东: @东东 has joined the channel
----
2018-12-03 17:01:39 UTC - Christophe Bornet: I'm trying to understand what is 
the purpose of the "discovery service". It seems its role is to get the list of 
active brokers from ZK that clients can lookup. But it seems to me that 
connecting a client directly to a broker gives about the same functionality. 
What do I miss ?
----
2018-12-03 17:45:13 UTC - Matteo Merli: It’s not very useful in practice.


When you expose a Pulsar service, you need to just expose 1 single hostname/IP 
to clients. There are several ways to do that. eg.
 * DNS cname with multiple IPs
 * VIP / Load balancer
 * Scheduler specific discovery service (eg: service DNS in Kubernetes)
  * …

This was an attempt to create a simple discovery service module, but the above 
alternatives are preferable.
----
2018-12-03 17:53:38 UTC - Matteo Merli: @Yifan You can certanily scale down the 
memory settings a lot.

For CPU you can check at the usage when running your particular workload and 
plan accordingly.

For memory, at your rate, you can reduce a lot from the defaults. 1GB (or even 
512MB) should be enough for any of the components.

The only things to be careful are the sizes of the caches, in broker and 
bookies. There are few settings that need to be scaled down with the configured 
memory. All of them are relative to direct memory.
 * `broker.conf`
    - `managedLedgerCacheSizeMB=1024` -&gt; 64

 * `bookkeeper.conf`:
    - `dbStorage_writeCacheMaxSizeMb=512` -&gt; 16
    - `dbStorage_readAheadCacheMaxSizeMb=256` -&gt; 0 (disable read cache)
    - `dbStorage_rocksDB_blockCacheSize=268435456` -&gt; 16777216 (16Mb)

This config options will be automated in next release 2.3 by having
default values tied to the -Xmx and max direct memory configured in JVM.
100 : Byron
----
2018-12-03 18:21:03 UTC - Christophe Bornet: Thanks. That's also my conclusion 
: as the discovery service itself would be a SPOF you would need to have a 
failover mecanism in front anyway which would probably be some kind of 
VIP/LB/DNS, etc...
----
2018-12-03 18:21:38 UTC - Karthik Palanivelu: Team, I am trying to find a doc 
on Synchronous Replication. I end up here - 
<http://pulsar.apache.org/docs/en/administration-geo.html#docsNav>. Can you 
please help me where I should look at?
----
2018-12-03 18:24:58 UTC - Christophe Bornet: For a VIP, what would be the best 
healthcheck in your opinion ?
----
2018-12-03 18:26:58 UTC - Christophe Bornet: for failover
----
2018-12-03 18:28:12 UTC - Matteo Merli: The check on the VIP health should be 
frequent and lightweight, in general.

Eg: you could hit <http://broker:8080/metrics>
----
2018-12-03 18:28:31 UTC - Matteo Merli: There is also a handler called 
`/status.html`
----
2018-12-03 18:28:48 UTC - Matteo Merli: (that’s what was used by Yahoo’s 
hardware VIPs)
----
2018-12-03 18:30:28 UTC - Matteo Merli: This handler will either respond 200 or 
404 depending on wether a file exists on the broker disk.

The path of that file is configured in `broker.conf` with :

```
statusFilePath=/xxx
```

This can be used to take a broker out of VIP rotation while the process is 
running
----
2018-12-03 18:52:18 UTC - Yifan: Thanks. I am not a Java person. What should I 
use for Xmx and Max direct memory in JVM configuration?  something like 64MB?
----
2018-12-03 18:59:02 UTC - Karthik Palanivelu: Team, I am having a Cluster A and 
B trying to use the Same ZooKeeper. It is failing on creating metadata with 
below error. I am trying to test Synchronous Replication with Same Zookeeper 
shared across two Clusters. I am trying to use the same ZK instance as local 
and global zk. If I start a Global ZK on the same instance, I am getting the 
wrong number of arguments error. Please advise on how to achieve it.
```
Exception in thread "main" 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /namespace
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:122)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:792)
        at 
org.apache.pulsar.PulsarClusterMetadataSetup.main(PulsarClusterMetadataSetup.java:156)
```
----
2018-12-03 19:01:16 UTC - Matteo Merli: I’d start with 256M or 512M and then 
check the mem usage
----
2018-12-03 19:04:47 UTC - Yifan: Okay, thanks.
----
2018-12-03 19:24:12 UTC - Olivier Chicha: There is effectively an incremental 
change Id.
the events are only sent in case of success, else it means that no change were 
commited to the DB
yes client are consuming one or the other but not both
For now I agree with you that staying on the 2 events seems to be the best 
option for the first version.
----
2018-12-03 19:30:25 UTC - Ivan Kelly: ok. the reason I asked about the 
monotonically increasing number is so that you can use idempotent publish for 
failure scenarios where the producer crashes
----
2018-12-04 07:15:23 UTC - fvelement: @fvelement has joined the channel
----
2018-12-04 08:25:59 UTC - Christophe Bornet: OK. I'll try that. About failing 
over to another region, how do I do that ? I would like to be able to failover 
the producers and consumers independently. They could be pointing to distinct 
VIPs but it seems that the VIP is only used for the first connection and after 
that the clients communicate directly with the broker. So how would I force the 
producers and consumers to reconnect and use the new VIP ?
----

Slack digest for #general - 2018-12-04

Reply via email to