2018-09-21 18:29:38 UTC - Grant Wu: ```
2018-09-21T18:22:23.046775688Z 18:22:23.046 
[bookkeeper-ml-workers-OrderedExecutor-15-0] WARN  
org.apache.bookkeeper.mledger.impl.OpReadEntry - 
[public/default/persistent/symphony][be645d03-95dc-4c78-acec-7ee0801c489f] read 
failed from ledger at position:4231:4789 : Bookie operation timeout
2018-09-21T18:22:23.046796482Z 18:22:23.046 
[broker-topic-workers-OrderedScheduler-4-0] ERROR 
org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer
 - [<persistent://public/default/symphony> / 
be645d03-95dc-4c78-acec-7ee0801c489f-Consumer{subscription=PersistentSubscription{topic=<persistent://public/default/symphony>,
 name=be645d03-95dc-4c78-acec-7ee0801c489f}, consumerId=3727, 
consumerName=02d87, address=/10.244.1.14:40032}] Error reading entries at 
4231:4789 : Bookie operation timeout - Retrying to read in 15.0 seconds
```
this sounds like disk error, correct?
----
2018-09-21 18:31:53 UTC - Matteo Merli: read timeout
----
2018-09-21 18:32:04 UTC - Matteo Merli: broker reading from bookies
----
2018-09-21 18:32:20 UTC - Grant Wu: Well yes
----
2018-09-21 18:32:28 UTC - Grant Wu: Could the read timeout be caused by disk 
error?
----
2018-09-21 18:32:31 UTC - Grant Wu: I think this is standalone
----
2018-09-21 18:33:24 UTC - Grant Wu: I am also getting a ton of
```
18:32:48.046 [BookKeeperClientScheduler-OrderedScheduler-0-0] ERROR 
org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught 
java.lang.NullPointerException: null
18:32:49.046 [BookKeeperClientScheduler-OrderedScheduler-0-0] ERROR 
org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught 
java.lang.NullPointerException: null
18:32:50.046 [BookKeeperClientScheduler-OrderedScheduler-0-0] ERROR 
org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught 
java.lang.NullPointerException: null
18:32:51.046 [BookKeeperClientScheduler-OrderedScheduler-0-0] ERROR 
org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught 
java.lang.NullPointerException: null
18:32:52.046 [BookKeeperClientScheduler-OrderedScheduler-0-0] ERROR 
org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught 
java.lang.NullPointerException: null
```
----
2018-09-21 18:35:14 UTC - Grant Wu: This is standalone deployment, I believe
----
2018-09-21 18:35:40 UTC - Sijie Guo: @Grant Wu what is the version?
----
2018-09-21 18:36:25 UTC - Matteo Merli: can you scroll up in the logs? JVM 
optimizes out the repeated exception, but the first time it happens there 
should be the full stack trace
----
2018-09-21 18:37:23 UTC - Grant Wu: that I can get you, hold on
----
2018-09-21 18:44:10 UTC - Grant Wu: 
----
2018-09-21 18:44:26 UTC - Grant Wu: @Matteo Merli sorry about the formatting.  
If you’d like, I can try to  get rid of some of the noise
----
2018-09-21 18:44:53 UTC - Grant Wu: It’s `2.1.0`
----
2018-09-21 18:49:49 UTC - Matteo Merli: @Grant Wu This looks like the issue 
reported in <https://github.com/apache/bookkeeper/pull/1618> . This was fixed 
in BK-4.7.2 and picked up in Pulsar 2.1.1
----
2018-09-21 18:50:40 UTC - Grant Wu: Okay, great.  I will tell us to move to 
that ASAP.
----
2018-09-21 18:50:52 UTC - Grant Wu: Any thoughts on the bookie issues, though?
----
2018-09-21 18:51:23 UTC - Matteo Merli: That was the bookie issue, because of 
the NPE the read operation doesn’t complete and gets timedout
----
2018-09-21 18:51:47 UTC - Grant Wu: Ah.  Fascinating…
----
2018-09-21 18:53:30 UTC - Grant Wu: Thanks a ton!
----
2018-09-21 18:54:10 UTC - Grant Wu: Could we be mis-using Pulsar to hit upon 
this bug?  The message says “but this code it only hit in the rare case where a 
client is trying to read or write the same entry from the same ledger more than 
once at the same time.”
----
2018-09-21 18:54:24 UTC - Grant Wu: So I am curious as to what we’re doing is 
strange
----
2018-09-21 18:55:04 UTC - Matteo Merli: Nothing wrong, that’s an internal read, 
it should never throw NPE
----
2018-09-21 18:55:51 UTC - David Orr: @David Orr has joined the channel
----
2018-09-21 19:04:02 UTC - Grant Wu: Ah, it appears my coworker has joined.  
:wave:
----
2018-09-21 19:04:45 UTC - Grant Wu: I am spreading the Pulsar slack’s 
reputation for quick triage, it seems
bug : Matteo Merli, Sijie Guo
+1 : David Orr
----
2018-09-21 19:34:47 UTC - Dave Southwell: I'm seeing this error `Caused by: 
org.apache.pulsar.client.api.PulsarClientException$ProducerBlockedQuotaExceededError:
 Cannot create producer on topic with backlog quota exceeded` but I don't see 
any quotas being set, nor can I see how much of the quota is used currently.  
I'm certain this is something very simple, and I'm just overlooking it, but 
I've run out of places I can think of to look.  This is my command that's 
resulting in this error.  `bin/pulsaperf produce 
<persistent://public/default/exclamation-output> --rate 50000`
----
2018-09-21 19:36:03 UTC - Grant Wu: 
<http://pulsar.incubator.apache.org/docs/en/cookbooks-retention-expiry/#backlog-quotas>
----
2018-09-21 19:36:29 UTC - Grant Wu: 
<http://pulsar.incubator.apache.org/docs/en/cookbooks-retention-expiry/#get-backlog-threshold-and-backlog-retention-policy>
 may help
----
2018-09-21 19:36:33 UTC - Grant Wu: I actually don’t know what the defaults are.
----
2018-09-21 19:36:36 UTC - Matteo Merli: @Dave Southwell Topics have a default 
quota if not specified. The default is set in the broker configuration and the 
default is 25 GB
----
2018-09-21 19:37:22 UTC - Dave Southwell: Is there a way to see that through 
the `pulsar-admin` command?
----
2018-09-21 19:37:41 UTC - Grant Wu: I believe my second link has it…
----
2018-09-21 19:37:57 UTC - Grant Wu: `pulsar-admin namespaces get-backlog-quotas 
my-tenant/my-ns`
----
2018-09-21 19:38:17 UTC - Dave Southwell: I thought the same, but I just get `{ 
}` as a result.
----
2018-09-21 19:38:30 UTC - Matteo Merli: Yes: `pulsar-admin namespaces 
set-backlog-quota $NAMESPACE --limit 2G --policy producer_request_hold`
----
2018-09-21 19:38:30 UTC - Grant Wu: Ah, well, I’ll let merlimat figure that out 
then
----
2018-09-21 19:38:43 UTC - Grant Wu: unless `{ }` means “default”?
----
2018-09-21 19:39:02 UTC - Dave Southwell: ....indeed `{  }` may just equate to 
default.
----
2018-09-21 19:39:11 UTC - Grant Wu: if true I’d say that’s a bit unfortunate
----
2018-09-21 19:40:13 UTC - Dave Southwell: I'm also curious how I can find out 
how much of the backlog quota is currently used.  I'm guessing that metric must 
exist somewhere.
----
2018-09-21 19:40:17 UTC - Matteo Merli: yes, `get-backlog-quotas` only gives 
you the namespace specific quota, but doesn’t report from the system default. 
It could definitely be improved
----
2018-09-21 19:40:40 UTC - Matteo Merli: `pulsar-admin topics stats $TOPIC` 
includes a `storage` value
----
2018-09-21 19:41:10 UTC - Matteo Merli: (it also gets exported to Prometheus, 
to setup alerts and such)
----
2018-09-21 19:51:41 UTC - Dave Southwell: This is great info.  Thanks very much!
----
2018-09-21 19:56:31 UTC - Dave Southwell: If I'm reading it correctly, my 
backlog size is lower than 25Gb, at 11.9Gb.   I would think then that I'm still 
under the default 25 Gb default quota.   ```{
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 0.0,
  "msgThroughputOut" : 0.0,
  "averageMsgSize" : 0.0,
  "storageSize" : 12829343430,
  "publishers" : [ ],
  "subscriptions" : {
    "my-subscription" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 0,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "unackedMessages" : 0,
      "type" : "Shared",
      "msgRateExpired" : 0.0,
      "consumers" : [ ]
    },
    "__compaction" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 114384,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "unackedMessages" : 0,
      "type" : "Exclusive",
      "msgRateExpired" : 0.0,
      "consumers" : [ ]
    },
    "my-subscription-1" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 0,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "unackedMessages" : 0,
      "type" : "Exclusive",
      "msgRateExpired" : 0.0,
      "consumers" : [ ]
    }
  },
  "replication" : { },
  "deduplicationStatus" : "Disabled"
}```
----
2018-09-21 19:57:43 UTC - Matteo Merli: actually, I was wrong on the default 
setting:

```
# Default per-topic backlog quota limit
backlogQuotaDefaultLimitGB=10
```
----
2018-09-21 19:57:57 UTC - Matteo Merli: in `broker.conf` (or `standalone.conf`)
----
2018-09-21 19:58:24 UTC - Dave Southwell: :+1:  Thanks!
----
2018-09-21 19:59:33 UTC - Matteo Merli: (and it doesn’t kick immediately, the 
check is done periodically. that’s why it goes up a bit to 11 GB)
----
2018-09-21 20:01:48 UTC - Matteo Merli: For compaction you should setup 
automatic “size-based” compaction
----
2018-09-21 20:02:28 UTC - Matteo Merli: `bin/pulsar-admin namespaces 
set-compaction-threshold --threshold 100M my-tenant/my-namespace`
----
2018-09-21 20:03:06 UTC - Dave Southwell: What changes the `storageSize` as 
compared to the number of messages in the backlog, or are those things not 
related?  I've let this thing sit over night and the `storageSize` is the same 
as it was yesterday and the backlog has been empty for ~24 hrs.
----
2018-09-21 20:03:44 UTC - Dave Southwell: Oh, maybe because I haven't set 
anything it just leaves it as it was.
----
2018-09-21 20:04:16 UTC - Matteo Merli: In your stats, the __compaction 
internal subscription is keeping the data from being deleted
----
2018-09-21 20:12:01 UTC - Dave Southwell: Oh, Ok.  Is there a way to remove the 
compaction subscription?  Also, when I run the `compaction-status` command with 
`-w` I get an error saying `Error in compaction` `null`  `Reason: Error 
compacting: java.util.concurrent.TimeoutException: Timeout`
----
2018-09-21 20:14:24 UTC - Matteo Merli: You can remove it, if you’re not going 
to use compaction (`pulsar-admin topics unsubscribe $TOPIC -s __compaction`) or 
enable automatic compaction as shown above
+1 : Dave Southwell
----
2018-09-21 22:30:00 UTC - Grant Wu: Do these bookie issues still happen on 
standalone?
----
2018-09-21 22:30:18 UTC - Grant Wu: The infra guy here is saying that Pulsar 
does not use a bookie in standalone mode…?  Not sure what he means
----
2018-09-21 22:32:02 UTC - Matteo Merli: It's embedded in same process, but it's 
the almost the same as a regular cluster 
----
2018-09-21 22:32:34 UTC - Grant Wu: So we should bump to 2.1.1 anyways right
----

Reply via email to