2019-12-16 10:28:45 UTC - Prasad: @Prasad has joined the channel
----
2019-12-16 12:09:51 UTC - Vladimir Shchur: Hi! I've caught a regression with 
2.4.2 cluster upgrade in my tests - when using partitioned topic (two 
partitions) with 2 consumers with shared subscription, in about 50% of time all 
messages from one of the partitions are received (by both consumers) while only 
1st message from another partition is received, in about 10% only first 
messages from both partitions are received, and in the rest 40% everything 
works fine, have anyone encountered that? In 2.4.1 I haven't seen such 
behavior. Tested with 2.4.1 client, but from what I've found - server just 
doesn't send messages from those partitions in first two cases.
----
2019-12-16 12:51:39 UTC - Daniel Ferreira Jorge: @Jared Mackey This is 
happening because you have to set a name to the producer. Otherwise, every time 
the producer starts, it will be given a random name and, consequently the 
sequence number will reset. The sequence numbers are held per producer name.
----
2019-12-16 14:16:32 UTC - Jared Mackey: @Daniel Ferreira Jorge I was setting 
the name of the producer and was getting the issue. I tried to let it randomly 
generate it and the same (like you said). It is a partitioned topic, do the 
names need to be unique per partition? Right now it is using the same name for 
all the partitions. 
----
2019-12-16 14:17:25 UTC - Rajitha: @Rajitha has joined the channel
----
2019-12-16 14:25:34 UTC - Daniel Ferreira Jorge: @Jared Mackey It has to be 
unique. Let's say you have 10 partitions. Then, 10 producers with the same name 
will be created. Hence, resetting to -1.
----
2019-12-16 14:26:03 UTC - Daniel Ferreira Jorge: name your producers according 
to your partitions
----
2019-12-16 14:26:37 UTC - Jared Mackey: Got it. That’s the bug. Thank you. 
----
2019-12-16 14:30:54 UTC - Daniel Ferreira Jorge: If you are creating partitions 
because you need many consumers and you need to maintain ordering, take a look 
at the Key_Shared subscription type. You won't need partitions and will be able 
to add consumers at will AND maintain message ordering per key.
----
2019-12-16 14:57:04 UTC - Jared Mackey: @Daniel Ferreira Jorge another 
question. How do producer names work when the producers are scale horizontally? 
Say in kubernetes. 
----
2019-12-16 14:57:23 UTC - Jared Mackey:  Do sequence IDs just not work in that 
scenario? 
----
2019-12-16 15:04:21 UTC - Daniel Ferreira Jorge: I don't see why it wouldn't... 
As long as all the producers names are unique
----
2019-12-16 15:04:59 UTC - Jared Mackey: I see. So the code would need to 
somehow generate a unique name based on some sort of env var. got it 
----
2019-12-16 15:47:08 UTC - Roman Popenov: I am running the `terraform init` 
command to set-up my AWS cluster to init a Pulsar cluster, and I am getting the 
following error:
```>terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "random" (hashicorp/random) 2.2.1...

Provider "<http://registry.terraform.io/-/aws|registry.terraform.io/-/aws>" 
v1.5.0 is not compatible with Terraform 0.12.18.

Provider version 2.7.0 is the earliest compatible version. Select it with
the following version constraint:

        version = "~&gt; 2.7"

Terraform checked all of the plugin versions matching the given constraint:
    1.5

Consult the documentation for this provider for more information on
compatibility between provider and Terraform versions.


Error: incompatible provider version```
----
2019-12-16 15:53:11 UTC - Roman Popenov: Will there be an effort to support 
later versions of terraform?
----
2019-12-16 16:03:07 UTC - Brian Doran: @Sijie Guo I have all the info further 
up the thread 13 partitioned topics with 10 partitions. Although I have moved 
to just 3 topics now.
The test is run through our normal performance test environment with out own 
data. We are not using pulsar-perf.
----
2019-12-16 16:53:15 UTC - Roman Popenov: Did anyone try to rewrite the `.tf` 
files using `terraform 0.12upgrade` command?
----
2019-12-16 16:58:18 UTC - Sijie Guo: Can you please create a github issue for 
this error you have seen? We can triage the issue and fix the error.
+1 : Roman Popenov
----
2019-12-16 16:59:38 UTC - Sijie Guo: do you have any program or tests to 
reproduce this behavior? Can you file an issue for that?
----
2019-12-16 17:28:22 UTC - Vladimir Shchur: I just wanted to check if someone is 
using shared subscription already and can check it. The test is here 
<https://github.com/fsharplang-ru/pulsar-client-dotnet/blob/develop/tests/IntegrationTests/Partitions.fs#L66-L153>,
 will try to reproduce it with Java client before filing an issue.
----
2019-12-16 17:34:17 UTC - Vladimir Shchur: This test always worked on 2.4.1, 
then I upgraded cluster to 2.4.2 and it started failing, then I downgraded to 
2.4.1 and it started working again all the time
----
2019-12-16 17:53:13 UTC - Roman Popenov: 
<https://github.com/apache/pulsar/issues/5869> - please let me know if this is 
up to standard
----
2019-12-16 17:53:54 UTC - Sijie Guo: thank you Roman
----
2019-12-16 18:16:59 UTC - ec: @ec has joined the channel
----
2019-12-16 18:18:32 UTC - ec: can you run Pulsar on windows? Docs says its on 
linux and mac but there is a github merged pr that `compiles` ?
----
2019-12-16 18:21:19 UTC - Addison Higham: someone was trying a month or so 
back, searching this slack might reveal details (or it might be out of slack 
history). I think it is in the state of "theoretically it should work, but not 
supported or tested"
----
2019-12-16 18:24:11 UTC - ec: Thx, I wondered why it wouldnt since its based on 
zookeeper and developed with java. What kind of api does it uses to tie itself 
to posix
----
2019-12-16 18:33:49 UTC - Ryan: We would like to investigate if we can replace 
Bookkeeper with CephFS. Anyone have any thoughts on why that would not be 
possible, a rough estimate on level of effort (LoE) and any pointers/tips in 
that regard? Thanks!
----
2019-12-16 18:36:05 UTC - Addison Higham: entirely? or just for offloaded 
segments?
----
2019-12-16 18:39:34 UTC - Ryan: Entirely. We are exploring an architecture 
where we use CephFS as a globally available shared filesystem, much like 
Lustre, but still need the higher-level functionality and API of Pulsar for 
security, pub/sub, etc. It seems that since Bookkeeper is essentially just a 
distributed object storage layer, we could replace it with CephFS.
----
2019-12-16 19:03:27 UTC - Jerry Peng: @Ryan I wouldn't describe Bookkeeper as a 
object store it offers additional/different set of capabilities capabilities 
that are not found in an typical in an object store.  For example, bookkeeper 
allows you to do tailing writes.  This is usually not possible in an object 
store.  Bookkeeper can also be described as an LSM (Log Structure Merge) tree 
to keep write durability but also reduce latency and also supports quorum 
writes to reduce latency spread.  These are some examples of why bookkeeper is 
used as the storage layer of Pulsar.  While its not impossible to write 
directly to an Object store or even  a filesystem, experiments done in the past 
have indicated that performance would not be very good for a number of reasons.
----
2019-12-16 19:05:50 UTC - Addison Higham: I have a question of my own about 
compaction. I understand the details of compaction of it essentially creating a 
new topic, which is nifty, but say that I do want to have the original topic 
have a retention policy of 6 months but to keep the latest version of each 
record indefinitely. I assume that I would need to make sure compaction 
happening often enough to guarantee that the compacted topic never has records 
that risk getting to be 6 months old?
----
2019-12-16 19:06:19 UTC - Jerry Peng: The better solution would be as @Addison 
Higham mentioned to offload data into an Object Store/Filesystem in the 
background
----
2019-12-16 19:08:35 UTC - Addison Higham: or you could run bookies backed by 
ceph volumes and do some interesting things that way...
----
2019-12-16 19:10:27 UTC - Jerry Peng: We are exploring the possibly of writing 
entries directly to an Object Store  when its being flushed from the write 
cache in BK.
----
2019-12-16 19:17:58 UTC - Jerry Peng: BTW Pulsar already supports offloading 
data to Object Storage
----
2019-12-16 19:18:37 UTC - Jerry Peng: 
<https://pulsar.apache.org/docs/en/2.4.2/concepts-tiered-storage/>
----
2019-12-16 19:21:15 UTC - Ryan: @Jerry Peng I appreciate the clarification on 
Bookkeeper vs. an object store, very informative, thank you. CephFS is a 
distributed shared filesystem and is almost 100% POSIX compliant (and has none 
of the limitations of NFS v3/4 thankfully). We already use it for massively 
parallel data storage and access, so we know it works. In addition to being a 
shared file system, Ceph also takes care of distributed block/file replication, 
fail-over, etc.; it seems many of the benefits of a Bookkeeper layer could be 
replaced. Do you have any technical details of the previous 
attempts/experiments at replacing Bookkeeper with a distributed shared 
filesystem?
----
2019-12-16 19:25:02 UTC - Addison Higham: @Ryan so one thing about Pulsar and 
the way it uses bookkeeper, you can tune the replication factor, how many nodes 
it writes to before it considers a write confirmed, etc. So as I mentioned, you 
could run bookie nodes backed by ceph volumes and set up Pulsar to only ever 
write to a single bookie.
+1 : Ryan
----
2019-12-16 19:25:58 UTC - Addison Higham: it is probably much easier to make 
bookkeeper work on top of ceph in a nice way then rip out BK and rewrite large 
parts of Pulsar
----
2019-12-16 19:27:28 UTC - Jerry Peng: @Ryan

&gt; Do you have any technical details of the previous attempts/experiments at 
replacing Bookkeeper with a distributed shared filesystem?
That is a long discussion worthy of a ten page research paper.  I would 
recommend first looking at how Bookkeeper works and how Pulsar uses Bookkeeper. 
 But previous of attempts of creating a pub/sub system to write directly to S3 
or HDFS have not worked out that well
----
2019-12-16 19:30:25 UTC - Jerry Peng: Systems like S3 and HDFS are designed for 
bulk reads and writes i.e. put emphasis on throughtput however in 
streaming/messaging world latency becomes a important dimension as well
----
2019-12-16 19:30:31 UTC - Ryan: @Addison Higham I see. Considering we are also 
using Rook-Ceph in our k8s cluster, this could be an excellent path forward. 
Ceph is managing replication, etc. so much of that logic would not need to be 
used/duplicated in Bookkeeper.
----
2019-12-16 19:31:07 UTC - Greg Hoover: Using pulsar-standalone:latest Docker 
container, trying to get data to display in dashboard via pulsar-grafana:latest 
Docker container with 
PROMETHEUS_URL=<http://pulsar-stand-alone:8080/metrics|http://pulsar-stand-alone:8080/metrics>
Logged into grafana as admin/admin. All the pulsar specific dashboards display 
with no data. Error message says “Cannot read property ‘result’ of undefined” 
for all. In Chrome debug tools I see data actually being returned to the 
browser in a text file format which looks like Prometheus format (not sure tho, 
this is still new for me). When I look in both the standalone and grafana 
containers I don’t find Prometheus, and there is nothing listening on port 
9090. I have tried the pulsar docs on this without success. Can someone point 
me in the right direction to displaying actual data in grafana? Thanks so much. 
----
2019-12-16 19:32:31 UTC - Jerry Peng: @Ryan @Addison Higham while the approach 
of using one BK instance and using a volume back by Ceph can work, now you are 
limiting the read and write bandwidth to one node.
----
2019-12-16 19:32:44 UTC - Addison Higham: one detail about the way pulsar uses 
BK, it isn't just for failover, it can also be configured in some ways to 
increase write/read throughput (based on how many bookies you allow in your 
quorum) by striping data across disks.
----
2019-12-16 19:32:46 UTC - Addison Higham: agree
----
2019-12-16 19:32:47 UTC - Ryan: @Jerry Peng Thank you. CephFS and HDFS are not 
the same and I do understand the limitations of HDFS. We have actually replaced 
HDFS with Ceph in one large Hadoop cluster which excellent results (and 
significantly less issues).
----
2019-12-16 19:34:02 UTC - Jerry Peng: BTW Bookkeeper has been used in 
production at companies such as Yahoo and Twitter for 6+ years, so its a fairly 
mature system
----
2019-12-16 19:39:12 UTC - Addison Higham: okay, back to compaction... it seems 
like it would be nice to also have a trigger based on time for compaction 
instead of just backlog size. If I have a low volume topic (that might 
eventually stop getting writes altogether) and a retention policy, eventually, 
the compaction won't trigger and records in the compacted topic would fall off. 
I assume right now the best way to do that is manually trigger compaction via 
some automated process (i.e. a k8s cron job)
----
2019-12-16 19:39:32 UTC - Ryan: @Jerry Peng We are looking at multiple 
considerations in our architecture/platform, including a consolidation of 
technologies and technology stacks. Not dedicating nodes to Bookkeeper, etc. 
when Ceph/CephFS is available (and possibly replicating much of its 
functionality) is something worth investigating. If Bookkeeper is extremely 
mature, then this could be a fruitful endeavor.
----
2019-12-16 19:40:21 UTC - Addison Higham: @Ryan I would suggest this article as 
a good primer on Pulsar and BK 
<https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works>
----
2019-12-16 19:43:10 UTC - Jerry Peng: @Addison Higham for compaction that is 
correct.  Though it wouldn't be that hard to add time based triggers.  
Currently, you would need to manually trigger it periodically
----
2019-12-16 19:44:56 UTC - Addison Higham: :thumbsup: thanks, just wanted to 
make sure I understood the details. We have some uses cases where we are doing 
versioning that would result in topics in the same namespace not receiving 
writes after a version changes but we still want to retain old data.
----
2019-12-16 22:27:39 UTC - Addison Higham: just want to make sure this is still 
correct: offloaded segments are never cleaned up even once the retention policy 
for that segment is past, correct?
----
2019-12-16 23:03:32 UTC - Sijie Guo: that is the behavior now. there is a 
github issue for adding a feature to clean them up.
----
2019-12-16 23:13:17 UTC - Addison Higham: not in 2.5.0 yet right? could that be 
added in 2.5.0 patch or does that fall into a new feature only for 2.6.0?
----
2019-12-16 23:29:53 UTC - Addison Higham: oh hrm... this is suprising, is there 
no way to set a cluster wide default  of offload threshold? I could have sworn 
I had seen that but now am not finding how to do it...
----
2019-12-16 23:30:04 UTC - Addison Higham: I can set deletion lag, but not 
offload threshold
----
2019-12-16 23:32:11 UTC - Sijie Guo: that can be in 2.5.1.
----
2019-12-16 23:32:24 UTC - David Kjerrumgaard: @Addison Higham Offloading is 
configured at the namespace level. Not cluster wide.
----
2019-12-16 23:32:28 UTC - David Kjerrumgaard: `$ bin/pulsar-admin namespaces 
set-offload-threshold --size 10M my-tenant/my-namespace`
----
2019-12-16 23:33:37 UTC - Addison Higham: right, but most of the options (like 
ttl, backlog quote, max producers/consumers) all have a way to set a default 
for any namespaces that don't set one themselves
----
2019-12-16 23:34:07 UTC - David Kjerrumgaard: Ah, so you want a "namespace 
level" default ?
----
2019-12-16 23:35:19 UTC - David Kjerrumgaard: I am not aware of any such 
configuration option.
----
2019-12-16 23:35:31 UTC - Addison Higham: correct, that way, I can let other 
users create namespaces as tenant admins and know that at least it has my 
default in place. I am wondering if there is a reason that isn't implemented... 
because offload deletion lag is implemented
----
2019-12-16 23:36:01 UTC - Roman Popenov: I think I also read that it can be set 
for the entire namespace
----
2019-12-16 23:36:22 UTC - Roman Popenov: Or maybe it was just TTL
----
2019-12-16 23:37:12 UTC - Sijie Guo: @Addison Higham I think it was probably 
just forgotten
----
2019-12-16 23:37:27 UTC - Sijie Guo: I don’t recall there is a reason not to do 
so.
----
2019-12-16 23:38:41 UTC - Addison Higham: okay, that is what it seems like, 
maybe will see if I can get a patch out real quick like... maybe have it make 
it it into 2.5.1 ( :cry: was looking forward to not being forked anymore :wink: 
)
100 : Sijie Guo
+1 : Sijie Guo
----
2019-12-17 00:13:40 UTC - Addison Higham: @Sijie Guo 
<https://github.com/apache/pulsar/pull/5872> wasn't too bad...
----
2019-12-17 08:28:05 UTC - LaxChan: @LaxChan has joined the channel
----
2019-12-17 09:09:20 UTC - Vladimir Shchur: Reproduced it with Java client as 
well
----

Reply via email to