Hi Dave,
Some notes first:
1) The following behavior is fine, BlueStore mounts in two stages - the
first one is read-only and among other things it loads allocation map
from DB. And that's exactly the case here.
Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.703+
7f0e15b3
Dave,
please see inline
On 7/26/2021 1:57 PM, Dave Piper wrote:
Hi Igor,
So to get more verbose but less log one can set both debug-bluestore and
debug-bluefs to 1/20. ...
More verbose logging attached. I've trimmed the file to a single restart
attempt to keep the filesize down; let me kno
Hi Igor,
> So to get more verbose but less log one can set both debug-bluestore and
> debug-bluefs to 1/20. ...
More verbose logging attached. I've trimmed the file to a single restart
attempt to keep the filesize down; let me know if there's not enough here.
> It would be also great to colle
Hi Igor,
Thanks for your time looking into this.
I've attached a 5 minute window of OSD logs, which includes several restart
attempt (each one takes ~25 seconds).
When I said it looked like we were starting up in a different state, I'm
referring to how "Recovered from manifest file" log appear
Hi Dave,
thanks for the update.
I'm curious whether reverting back to default allocator on the latest
release would be OK as well. Please try if possible.
Thanks,
Igor
On 8/12/2021 2:00 PM, Dave Piper wrote:
Hi Igor,
Just to update you on our progress.
- We've not had another repro of t
Hi Igor,
Just to update you on our progress.
- We've not had another repro of this since switching to bitmap allocator /
upgrading to the latest octopus release. I'll try to gather the full set of
diags if we do see this again.
- I think my issues with an empty /var/lib/ceph/osd/ceph-N/ folder
Igor,
We've hit this again on ceph 15.2.13 using the default allocator. Once again,
configuring the OSDs to use the bitmap allocator has fixed up the issue.
I'm still trying to gather the full set of debug logs from the crash. I think
again the fact I'm running in containers is the issue here;
Hi Dave,
so may be another bug in Hybid Allocator...
Could you please dump free extents for your "broken" osd(s) by issuing
"ceph-bluestore-tool --path --command free-dump". OSD to
be offline.
Preferably to have these reports after you reproduce the issue with
hybrid allocator once again
I'll keep trying to repro and gather diags, but running in containers is making
it very hard to run debug commands while the ceph daemons are down. Is this a
known problem with a solution?
In the meantime, what's the impact of running with the Bitmap Allocator instead
of the Hybrid one? I'm ne
On 8/26/2021 4:18 PM, Dave Piper wrote:
I'll keep trying to repro and gather diags, but running in containers is making
it very hard to run debug commands while the ceph daemons are down. Is this a
known problem with a solution?
Sorry, not aware of this issues. I don't use containers though.
We've started hitting this issue again, despite having bitmap allocator
configured. The logs just before the crash look similar to before (pasted
below).
So perhaps this isn't a hybrid allocator issue after all?
I'm still struggling to collect the full set of diags / run ceph-bluestore-tool
c
Okay - I've finally got full debug logs from the flapping OSDs. The raw logs
are both 100M each - I can email them directly if necessary. (Igor I've already
sent these your way.)
Both flapping OSDs are reporting the same "bluefs _allocate failed to allocate"
errors as before. I've also noticed
Den mån 20 sep. 2021 kl 18:02 skrev Dave Piper :
> Okay - I've finally got full debug logs from the flapping OSDs. The raw logs
> are both 100M each - I can email them directly if necessary. (Igor I've
> already sent these your way.)
> Both flapping OSDs are reporting the same "bluefs _allocate f
I still can't find a way to get ceph-bluestore-tool working in my containerized
deployment. As soon as the OSD daemon stops, the contents of
/var/lib/ceph/osd/ceph- are unreachable.
I've found this blog post that suggests changes to the container's entrypoint
are required, but the proposed fix
Hi Dave,
I think it's your disk sizing/utilization what makes your setup rather
unique and apparently causes the issue.
First of all you're using custom 4K min_alloc_size which wasn't adapted
before Pacific, aren't you?
2021-09-08T10:42:02.049+ 7f705c4f2f00 1
bluestore(/var/lib/ceph/o
On 9/21/2021 10:44 AM, Dave Piper wrote:
I still can't find a way to get ceph-bluestore-tool working in my containerized
deployment. As soon as the OSD daemon stops, the contents of
/var/lib/ceph/osd/ceph- are unreachable.
Some speculations on the above. /var/lib/ceph/osd/ceph- is just a
m
Some interesting updates on our end.
This cluster (condor) is in a multisite RGW zonegroup with another cluster
(albans). Albans is still on nautilus and was healthy back when we started this
thread. As a last resort, we decided to destroy condor and recreate it, putting
it back in the zonegro
On 9/30/2021 6:28 PM, Dave Piper wrote:
Thanks so much Igor, this is making a lot of sense.
First of all you're using custom 4K min_alloc_size which wasn't adapted before
Pacific, aren't you?
We've set bluestore_min_alloc_size = 4096 because we write a lot of small
objects. Various sources
Thanks so much Igor, this is making a lot of sense.
> First of all you're using custom 4K min_alloc_size which wasn't adapted
> before Pacific, aren't you?
We've set bluestore_min_alloc_size = 4096 because we write a lot of small
objects. Various sources recommended this as a solution to not ov
Hi,
On 9/30/21 18:02, Igor Fedotov wrote:
Using non-default min_alloc_size is generally not recommended. Primarily
due to perfomance penalties. Some side effects (like your ones) can be
observed as well. That's simple - non-default parameters generally mean
much worse QA coverage devs and les
On 9/30/ 2021 7:03 PM, Igor Fedotov wrote:
> 3) reduce main space space fragmentation by using Hybrid allocator from
> scratch - OSD redeployment is required as well.
>
>> We deployed these clusters at nautilus with the default allocator, which was
>> bitmap I think? After redeploying condor on
21 matches
Mail list logo