Slack digest for #general - 2019-03-12

Apache Pulsar Slack Tue, 12 Mar 2019 02:11:22 -0700

2019-03-11 09:35:37 UTC - Ali Ahmed: 
<https://github.com/apache/pulsar-client-node/pull/1>
----
2019-03-11 10:16:36 UTC - Darragh: Is there anyone that could take a look at 
our grafana dashboard and possibly recommend any actions we can take to get rid 
of our tail latencies ? (I can pm you the url).  It's hosted on ec2 and our 
average latencies are a solid 1ms but the 99 percentile latencies are 
troublesome (reaching into the 200ms range)
----
2019-03-11 10:18:53 UTC - bhagesharora: @jia zhai yeah It's started
----
2019-03-11 11:23:24 UTC - jia zhai: @bhagesharora all the check in this link is 
OK?
<http://pulsar.apache.org/docs/en/io-quickstart/#start-pulsar-service>
----
2019-03-11 11:55:50 UTC - bhagesharora: @jia zhai 
bhagesharora93@pulsar-setup-2-3-0-n1-standalone:~$ curl -s 
<http://localhost:8080/admin/v2/functions/connectors>
[]bhagesharora93@pulsar-setup-2-3-0-n1-standalone:~$
----
2019-03-11 11:56:00 UTC - bhagesharora: see connectors are not present, Its 
empty
----
2019-03-11 11:56:41 UTC - jia zhai: That means nar files should be placed 
correctly.
----
2019-03-11 12:05:57 UTC - bhagesharora: $ wget 
<https://archive.apache.org/dist/pulsar/pulsar-2.3.0/apache-pulsar-io-connectors-2.3.0-bin.tar.gz>
----
2019-03-11 12:06:03 UTC - bhagesharora: this is giving 404 error
----
2019-03-11 12:14:56 UTC - jia zhai: try 
<https://archive.apache.org/dist/pulsar/pulsar-2.3.0/connectors>
----
2019-03-11 12:15:08 UTC - jia zhai: it is changed, and doc not updated
----
2019-03-11 12:54:08 UTC - bhagesharora: @jia zhai What is the right path for 
<https://archive.apache.org/dist/pulsar/pulsar-2.3.0/apache-pulsar-io-connectors-2.3.0-bin.tar.gz>
----
2019-03-11 12:54:14 UTC - bhagesharora: actually I need tar file
----
2019-03-11 13:00:30 UTC - bhagesharora: 
<https://archive.apache.org/dist/pulsar/pulsar-2.3.0/apache-pulsar-2.3.0-bin.tar.gz>
----
2019-03-11 13:00:39 UTC - bhagesharora: this is correct, I guess
----
2019-03-11 13:04:58 UTC - jia zhai: 
<https://archive.apache.org/dist/pulsar/pulsar-2.3.0/connectors>
----
2019-03-11 13:05:07 UTC - jia zhai: this contains all nars
----
2019-03-11 13:05:21 UTC - jia zhai: you just need to download the one that you 
want
----
2019-03-11 13:05:39 UTC - jia zhai: It is a dir
----
2019-03-11 13:30:39 UTC - Darragh: I've also noticed the journal queue and sync 
latencies to be in the 300ms range, somewhat mimicking the add entry latencies. 
 Is this the possible cause or is this rather a consequence of the higher entry 
latencies ?
----
2019-03-11 13:59:07 UTC - jia zhai: @Darragh What type of disk is used for 
journal?
----
2019-03-11 14:00:29 UTC - jia zhai: By default, data is fsynced on journal 
before acknowledgment. So it is better to have a disk which has low latency of 
flush.
----
2019-03-11 14:01:35 UTC - Darragh: @jia zhai, io1's with 64k iops
----
2019-03-11 14:05:40 UTC - jia zhai: That should be good
----
2019-03-11 14:07:05 UTC - jia zhai: But from your description, it should be 
related to journal sync.
&gt; I’ve also noticed the journal queue and sync latencies to be in the 300ms 
range
----
2019-03-11 14:07:59 UTC - Darragh: 
----
2019-03-11 14:08:10 UTC - Darragh: 
----
2019-03-11 14:08:45 UTC - Darragh: my primary concern is the 'add entry 
latencies' being pretty high in the 99 percentile and up
----
2019-03-11 14:09:15 UTC - Darragh: and I simply noticed that the journal 
latency also seems to have these higher values.  I don't know however whether 
those are an issue
----
2019-03-11 14:09:28 UTC - Darragh: and if they are, whether they would be the 
cause or a consequence of the entry latencies
----
2019-03-11 14:14:52 UTC - jia zhai: journal-queue-latency is the time between 
in-queque and  been flushed. This is related directly to the disk flush.
----
2019-03-11 14:15:28 UTC - Darragh: and is there anything in particular I can do 
to improve the latencies shown above ?
----
2019-03-11 14:15:32 UTC - jia zhai: and queue length
----
2019-03-11 14:15:54 UTC - Darragh: or something you can think of that might be 
misconfigured
----
2019-03-11 14:18:38 UTC - jia zhai: in the pic from 12:00 to 13:30,  seems 
there are a lot of requests coming, and it seems reach the limit of journal 
disk.
----
2019-03-11 14:19:03 UTC - jia zhai: Currently we could config several disks for 
journal
----
2019-03-11 14:19:18 UTC - jia zhai: It may help in this case.
----
2019-03-11 14:21:38 UTC - Darragh: so giving each of our bookkeeper instances 2 
disks instead of the one for journal io ?
----
2019-03-11 14:22:53 UTC - jia zhai: I think it maybe help. @Sijie Guo  to 
confirm this.
----
2019-03-11 15:01:35 UTC - Maarten Tielemans: Hi @jia zhai, if the issue is 
related to disk speed it would be caused by throughput? We just changed our 
perf-test to send 1000 (instead of 10000) msg/sec and we still see the 99.9% 
latency spikes
----
2019-03-11 15:12:51 UTC - David Kjerrumgaard: @bhagesharora You have to 
download and install the 'builtin' connectors as a separate step, as described 
here: <http://pulsar.apache.org/docs/en/io-quickstart/>
----
2019-03-11 15:31:06 UTC - Matteo Merli: @Maarten Tielemans @Darragh which EC2 
instances are you using?
----
2019-03-11 15:31:25 UTC - Matteo Merli: Are you writing on local disk or EBS?
----
2019-03-11 15:31:41 UTC - Alexandre DUVAL: 
----
2019-03-11 15:31:49 UTC - Alexandre DUVAL: 
----
2019-03-11 15:31:54 UTC - Alexandre DUVAL: @Matteo Merli ^
----
2019-03-11 15:31:56 UTC - Darragh: for the bookkeeper we're using c5.2xlarge's 
with an ebs
----
2019-03-11 15:32:09 UTC - Darragh: we currently have brokers on the same machine
----
2019-03-11 15:32:22 UTC - Darragh: these instances have 2 ebs mounted volumes
----
2019-03-11 15:32:31 UTC - Darragh: one io1 for the journal and one st1 for the 
ledgers
----
2019-03-11 15:32:33 UTC - Matteo Merli: Ok, what type and size of the EBS?
----
2019-03-11 15:32:48 UTC - Darragh: ```
  ebs_block_device {
    device_name          = "/dev/sdf"
    volume_type          = "io1"
    volume_size          = "1280"
    iops                 = "64000"
  }


  ebs_block_device {
    device_name          = "/dev/sdg"
    volume_type          = "st1"
    volume_size          = "1280"
  }```
----
2019-03-11 15:34:38 UTC - Matteo Merli: Size is in GB?
----
2019-03-11 15:35:27 UTC - Darragh: yes
----
2019-03-11 15:35:28 UTC - Maarten Tielemans: ```
c5.2xlarge (x3)
        Broker
        Bookie
        
        EBS - /mnt/journal
        io1
        1280 GB
        64000 IOPS
        
        EBS - /mnt/ledger
        st1
        1280 GB
        
        ELB connected to these 3 instances
        
t2.small (x3)
        Zookeeper
        
c5.2xlarge
        Proxy (not used)
        Used for running the publish/subscribe performance tests
```
----
2019-03-11 15:35:29 UTC - Darragh: ```NAME          MAJ:MIN RM  SIZE RO TYPE 
MOUNTPOINT
nvme1n1       259:0    0  1.3T  0 disk /mnt/journal
nvme2n1       259:1    0  1.3T  0 disk /mnt/storage
nvme0n1       259:2    0    8G  0 disk 
├─nvme0n1p1   259:3    0    8G  0 part /
└─nvme0n1p128 259:4    0    1M  0 part ```
----
2019-03-11 15:35:49 UTC - Maarten Tielemans: (So the bookie and broker are 
running on the same machine atm)
----
2019-03-11 15:37:03 UTC - Maarten Tielemans: and these instances are deployed 
in a placement group to ensure sufficient network speed and low latency 
networking
----
2019-03-11 16:41:56 UTC - Lak Tuttagunta: @Lak Tuttagunta has joined the channel
----
2019-03-11 17:37:06 UTC - Matteo Merli: Can you attach the pcap file 
:slightly_smiling_face:
----
2019-03-11 17:51:37 UTC - Matteo Merli: @Maarten Tielemans The interesting part 
is that the fsync latency 99pct is good, while the max goes up to 300ms
----
2019-03-11 17:52:10 UTC - Matteo Merli: One quick way to address this and keep 
the latency low is to write 3 copies and wait for 2 acks:
----
2019-03-11 17:54:27 UTC - Matteo Merli: ```
bin/pulsar-admin namespaces set-persistence $NAMESPACE \
   --bookkeeper-ensemble 3 \
   --bookkeeper-write-quorum 3 \
   --bookkeeper-ack-quorum 2 \
   --ml-mark-delete-max-rate 1
```
----
2019-03-11 17:55:24 UTC - Matteo Merli: (and use use `pulsar-admin namespaces 
unload $NAMESPACE` to immediately apply the change, otherwise it will be 
applied and the next ledger rollover)
----
2019-03-11 18:22:21 UTC - Eugene Mitskevich: Guys, please check this case, when 
you'll have a moment: <https://github.com/apache/pulsar/issues/3799>
(In java client) looks like `org.bouncycastle` dependency and its `META-INF` 
files somehow break the manifest of the whole project.
----
2019-03-11 18:53:06 UTC - Chris DiGiovanni: Having an issue with Topic offload 
on a new configured cluster.  Just a heads up, I had a bad config when they 
first started as I forgot to pass the AWS_SECRET_ACCESS_KEY to the environment. 
 I have since restarted each of the broker after I fixes the secret key issue.

Below are the three values I configured:
**I stared out the top level domain for privacy:
```
managedLedgerOffloadDriver=S3
s3ManagedLedgerOffloadRegion=us
s3ManagedLedgerOffloadBucket=fio-dev-pulsar-topic-offload
s3ManagedLedgerOffloadServiceEndpoint=<https://us-chhq.ceph>.*****
```

I then see messages like this:

```
16:53:31.930 [main] INFO  org.apache.bookkeeper.mledger.offload.OffloaderUtils 
- Found offloader OffloaderDefinition(name=jcloud, description=JCloud based 
offloader implementation, 
offloaderFactoryClass=org.apache.bookkeeper.mledger.offload.jcloud.JCloudLedgerOffloaderFactory)
 from /pulsar/./offloaders/tiered-storage-jcloud-2.2.1.nar
16:53:32.094 [load-factory-class 
org.apache.bookkeeper.mledger.offload.jcloud.JCloudLedgerOffloaderFactory] INFO 
 org.apache.bookkeeper.mledger.offload.OffloaderUtils - Loading offloader 
factory class 
org.apache.bookkeeper.mledger.offload.jcloud.JCloudLedgerOffloaderFactory using 
class loader 
org.apache.pulsar.common.nar.NarClassLoader[/tmp/pulsar-nar/tiered-storage-jcloud-2.2.1.nar-unpacked]
16:53:32.133 [main] INFO  
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader
 - Constructor offload driver: aws-s3, host: <https://us-chhq.ceph>.*****, 
container: fio-dev-pulsar-topic-offload, region: us

16:53:33.565 [main] INFO  
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader
 - Connect to blobstore : driver: aws-s3, region: us, endpoint: 
<https://us-chhq.ceph>.*****
17:04:35.723 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO  
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader
 - Connect to blobstore : driver: aws-s3, region: , endpoint: 
<https://us-chhq.ceph>.*****
17:05:30.681 [offloader-OrderedScheduler-1-0] ERROR 
org.jclouds.http.handlers.BackoffLimitedRetryHandler - Cannot retry after 
server error, command has exceeded retry limit 100: 
[method=org.jclouds.aws.s3.AWSS3Client.public abstract java.lang.String 
org.jclouds.s3.S3Client.getBucketLocation(java.lang.String)[fio-dev-pulsar-topic-offload],
 request=GET <https://s3.amazonaws.com/fio-dev-pulsar-topic-offload?location> 
HTTP/1.1]
```
----
2019-03-11 18:53:24 UTC - Chris DiGiovanni: Why is trying to go to Amazon?  
`<https://s3.amazonaws.com/fio-dev-pulsar-topic-offload>`
----
2019-03-11 21:13:28 UTC - vinay Parekar: Hi everyone. I need some help creating 
a kafka source on token authenticated pulsar server
----
2019-03-11 21:14:26 UTC - vinay Parekar: I am successfully able to consume data 
using kafka source locally from kerborized kafka brockers
----
2019-03-11 21:15:08 UTC - vinay Parekar: but when i try to use same 
configuration with token authenticated pulsar. i am not able to get any data
----
2019-03-11 21:15:26 UTC - vinay Parekar: Here is what i did :
----
2019-03-11 21:15:53 UTC - vinay Parekar: 1. changed my client.conf file to 
point to token autherized pulsar server
----
2019-03-11 21:16:23 UTC - vinay Parekar: 2. created a .yml file pointing to 
kafka server
----
2019-03-11 21:20:02 UTC - vinay Parekar: 3../bin/pulsar-admin source create 
--tenant public --namespace default --name campaign-kafka-source 
--destination-topic-name campaign-manager-bidding --source-type kafka 
--source-config-file campaign-kafka-source.yml
----
2019-03-11 21:32:27 UTC - Sanjeev Kulkarni: @vinay Parekar what are you seeing 
in the logs/
----
2019-03-11 21:33:04 UTC - vinay Parekar: I am able to see that it creates the 
kafka source with no error logs
----
2019-03-11 21:33:13 UTC - Ali Ahmed: @Chris DiGiovanni you are using the s3 off 
loader so it will end up lookin in s3
----
2019-03-11 21:33:50 UTC - Ali Ahmed: are you trying to use an alternate s3 
compatible service instead ?
----
2019-03-11 21:47:52 UTC - Sanjeev Kulkarni: and then what happens?
----
2019-03-11 21:48:01 UTC - Sanjeev Kulkarni: any error messages
----
2019-03-11 21:54:44 UTC - vinay Parekar: no errors. when i try to consume the 
topic do not see any data
----
2019-03-11 21:55:38 UTC - vinay Parekar: the same thing when I try on 
standalone local of mine, works like charm
----
2019-03-11 23:08:11 UTC - Maarten Tielemans: I agree the results are peculiar. 
I will have to double check, but I think we set 3 3 2 as default on the 
namespace (not ml mark delete max rate). I will check tomorrow morning. Any 
other settings which you could advice we look into?
----
2019-03-12 00:01:53 UTC - Matteo Merli: The other option is to disable fsync 
(`journalSyncData=false` in `bookkeeper.conf`). That would be relax the 
durability (on the same level of what Kafka provides)
----
2019-03-12 03:13:51 UTC - jia zhai: @Chris DiGiovanni There seems be some 
issue, will check the code.
----
2019-03-12 03:46:54 UTC - Jennifer Huang: @Jennifer Huang has joined the channel
----
2019-03-12 04:19:11 UTC - Jerry Peng: @vinay Parekar how have you configured 
token auth in your pulsar cluster?
----
2019-03-12 04:19:38 UTC - Jerry Peng: Would you mind sharing the configs
----
2019-03-12 07:05:39 UTC - jia zhai: Seems in This line, we should not convert 
s3 into aws-s3:
<https://github.com/apache/pulsar/blob/master/tiered-storage/jcloud/src/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreManagedLedgerOffloader.java#L189>
----
2019-03-12 07:27:13 UTC - jia zhai: Build a new nar with the fix which 
un-comment the lines
----

Slack digest for #general - 2019-03-12

Reply via email to