One of the main limitations of using CephFS is the requirement to reduce the
number of active MDS daemons to one during upgrades. As far as I can tell this
has been a known problem since Luminous (~2017). This issue essentially
requires downtime during upgrades for any CephFS cluster that needs
Thanks David! This looks good now. :)
> On Jul 8, 2021, at 6:28 PM, David Galloway wrote:
>
> Done!
>
> On 7/8/21 3:51 PM, Bryan Stillwell wrote:
>> There appears to be arm64 packages built for Ubuntu Bionic, but not for
>> Focal. Any chance Focal pa
I upgraded one of my clusters to v16.2.5 today and now I'm seeing these
messages from 'ceph -W cephadm':
2021-07-08T22:01:55.356953+ mgr.excalibur.kuumco [ERR] Failed to apply
alertmanager spec AlertManagerSpec({'placement': PlacementSpec(count=1),
'service_type': 'alertmanager',
There appears to be arm64 packages built for Ubuntu Bionic, but not for Focal.
Any chance Focal packages can be built as well?
Thanks,
Bryan
> On Jul 8, 2021, at 12:20 PM, David Galloway wrote:
>
> Caution: This email is from an external sender. Please do not click links or
> open
upgrades after that, which means the global container
image name was never changed.
Bryan
On Jun 1, 2021, at 9:38 AM, Bryan Stillwell
mailto:bstillw...@godaddy.com>> wrote:
This morning I tried adding a mon node to my home Ceph cluster with the
following command:
ceph orch daemon add mon
This morning I tried adding a mon node to my home Ceph cluster with the
following command:
ceph orch daemon add mon ether
This seemed to work at first, but then it decided to remove it fairly quickly
which broke the cluster because the mon. keyring was also removed:
[8,17,4,1,14,0,19,8]p8
2021-05-11T22:41:11.332885+ 2021-05-11T22:41:11.332885+
I'm now considering using device classes and assigning the OSDs to either hdd1
or hdd2... Unless someone has another idea?
Thanks,
Bryan
> On May 14, 2021, at 12:35 PM, Bryan Stillwell wrote:
>
> This wor
oseleaf indep 1 type osd
> step emit
>
> J.
>
> ‐‐‐ Original Message ‐‐‐
>
> On Wednesday, May 12th, 2021 at 17:58, Bryan Stillwell
> wrote:
>
>> I'm trying to figure out a CRUSH rule that will spread data out across my
>> cluster as much as possib
I'm looking for help in figuring out why cephadm isn't making any progress
after I told it to redeploy an mds daemon with:
ceph orch daemon redeploy mds.cephfs.aladdin.kgokhr ceph/ceph:v15.2.12
The output from 'ceph -W cephadm' just says:
2021-05-14T16:24:46.628084+ mgr.paris.glbvov [INF]
2 mandalaybay
2 paris
...
Hopefully someone else will find this useful.
Bryan
> On May 12, 2021, at 9:58 AM, Bryan Stillwell wrote:
>
> I'm trying to figure out a CRUSH rule that will spread data out across my
> cluster as much as possible, but not more than 2 chunks per host
I'm trying to figure out a CRUSH rule that will spread data out across my
cluster as much as possible, but not more than 2 chunks per host.
If I use the default rule with an osd failure domain like this:
step take default
step choose indep 0 type osd
step emit
I get clustering of 3-4 chunks on
I tried upgrading my home cluster to 15.2.7 (from 15.2.5) today and it appears
to be entering a loop when trying to match docker images for ceph:v15.2.7:
2020-12-01T16:47:26.761950-0700 mgr.aladdin.liknom [INF] Upgrade: Checking mgr
daemons...
2020-12-01T16:47:26.769581-0700 mgr.aladdin.liknom
I have a cluster running Nautilus where the bucket instance (backups.190) has
gone missing:
# radosgw-admin metadata list bucket | grep 'backups.19[0-1]' | sort
"backups.190",
"backups.191",
# radosgw-admin metadata list bucket.instance | grep 'backups.19[0-1]' | sort
The last two days we've experienced a couple short outages shortly after
setting both 'noscrub' and 'nodeep-scrub' on one of our largest Ceph clusters
(~2,200 OSDs). This cluster is running Nautilus (14.2.6) and setting/unsetting
these flags has been done many times in the past without a problem.
On Mar 24, 2020, at 5:38 AM, Abhishek Lekshmanan wrote:
> #. Upgrade monitors by installing the new packages and restarting the
> monitor daemons. For example, on each monitor host,::
>
> # systemctl restart ceph-mon.target
>
> Once all monitors are up, verify that the monitor upgrade
Great work! Thanks to everyone involved!
One minor thing I've noticed so far with the Ubuntu Bionic build is it's
reporting the release as an RC instead of being 'stable':
$ ceph versions | grep octopus
"ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus
(rc)": 1
I just noticed that arm64 packages only exist for xenial. Is there a reason
why bionic packages aren't being built?
Thanks,
Bryan
> On Dec 20, 2019, at 4:22 PM, Bryan Stillwell wrote:
>
> I was going to try adding an OSD to my home cluster using one of the 4GB
> Raspber
I was going to try adding an OSD to my home cluster using one of the 4GB
Raspberry Pis today, but it appears that the Ubuntu Bionic arm64 repo is
missing a bunch of packages:
$ sudo grep ^Package:
/var/lib/apt/lists/download.ceph.com_debian-nautilus_dists_bionic_main_binary-arm64_Packages
On Dec 18, 2019, at 1:48 PM, e...@lapsus.org wrote:
>
> That sounds very similar to what I described there:
> https://tracker.ceph.com/issues/43364
I would agree that they're quite similar if not the same thing! Now that you
mention it I see the thread is named mgr-fin in 'top -H' as well. I
After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5 I'm
seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H').
Attaching to the thread with strace shows a lot of mmap and munmap calls.
Here's the distribution after watching it for a few minutes:
, 2019, at 10:27 AM, Sasha Litvak
mailto:alexander.v.lit...@gmail.com>> wrote:
Notice: This email is from an external sender.
Bryan,
Were you able to resolve this? If yes, can you please share with the list?
On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell
mailto:bstillw...@godad
alFrameEx
0.55% [kernel] [k] _raw_spin_unlock_irqrestore
I increased mon debugging to 20 and nothing stuck out to me.
Bryan
> On Dec 12, 2019, at 4:46 PM, Bryan Stillwell wrote:
>
> On our test cluster after upgrading to 14.2.5 I'm having problems with the
&g
On our test cluster after upgrading to 14.2.5 I'm having problems with the mons
pegging a CPU core while moving data around. I'm currently converting the OSDs
from FileStore to BlueStore by marking the OSDs out in multiple nodes,
destroying the OSDs, and then recreating them with ceph-volume
Rich,
What's your failure domain (osd? host? chassis? rack?) and how big is each of
them?
For example I have a failure domain of type rack in one of my clusters with
mostly even rack sizes:
# ceph osd crush rule dump | jq -r '.[].steps'
[
{
"op": "take",
"item": -1,
"item_name":
On Nov 18, 2019, at 8:12 AM, Dan van der Ster wrote:
>
> On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis wrote:
>>
>> On 19/11/14 11:04AM, Gregory Farnum wrote:
>>> On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster
>>> wrote:
Hi Joao,
I might have found the reason why
a solution yet so I'll stick with disabled balancer
> for now since the current pg placement is fine.
>
> Regards,
> Eugen
>
>
> [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56994.html
> [2] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56890.h
On multiple clusters we are seeing the mgr hang frequently when the balancer is
enabled. It seems that the balancer is getting caught in some kind of infinite
loop which chews up all the CPU for the mgr which causes problems with other
modules like prometheus (we don't have the devicehealth
r help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Tue, Nov 19, 2019 at 8:42 PM Bryan Stillwell
> wrote:
>>
>> Closing the loop here. I figured ou
this was to track down, maybe a check should be added before
enabling msgr2 to make sure the require-osd-release is set to nautilus?
Bryan
> On Nov 18, 2019, at 5:41 PM, Bryan Stillwell wrote:
>
> I cranked up debug_ms to 20 on two of these clusters today and I'm still not
> understand
5.979 7f917becf700 1 -- 10.0.13.2:0/3084510 learned_addr
learned my addr 10.0.13.2:0/3084510 (peer_addr_for_me v1:10.0.13.2:0/0)
The learned address is v1:10.0.13.2:0/0. What else can I do to figure out why
it's deciding to use the legacy protocol only?
Thanks,
Bryan
> On Nov 15, 2019, at
I've upgraded 7 of our clusters to Nautilus (14.2.4) and noticed that on some
of the clusters (3 out of 7) the OSDs aren't using msgr2 at all. Here's the
output for osd.0 on 2 clusters of each type:
### Cluster 1 (v1 only):
# ceph osd find 0 | jq -r '.addrs'
{
"addrvec": [
{
There are some bad links to the mailing list subscribe/unsubscribe/archives on
this page that should get updated:
https://ceph.io/resources/
The subscribe/unsubscribe/archives links point to the old lists vger and
lists.ceph.com, and not the new lists on lists.ceph.io:
ceph-devel
With FileStore you can get the number of OSD maps for an OSD by using a simple
find command:
# rpm -q ceph
ceph-12.2.12-0.el7.x86_64
# find /var/lib/ceph/osd/ceph-420/current/meta/ -name 'osdmap*' | wc -l
42486
Does anyone know of an equivalent command that can be used with BlueStore?
Thanks,
Thanks Casey!
Adding the following to my swiftclient put_object call caused it to start
compressing the data:
headers={'x-object-storage-class': 'STANDARD'}
I appreciate the help!
Bryan
> On Nov 7, 2019, at 9:26 AM, Casey Bodley wrote:
>
> On 11/7/19 10:35 AM, Bryan Stillw
Responding to myself to follow up with what I found.
While going over the release notes for 14.2.3/14.2.4 I found this was a known
problem that has already been fixed. Upgrading the cluster to 14.2.4 fixed the
issue.
Bryan
> On Oct 30, 2019, at 10:33 AM, Bryan Stillwell wr
-
> just note that some 'helpful' s3 clients will insert a
> 'x-amz-storage-class: STANDARD' header to requests that don't specify
> one, and the presence of this header will override the user's default
> storage class.
>
> On 10/29/19 12:20 PM, Bryan Stillwell wrote:
>>
1, in start
>self.tick()
> File
> "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
> 2090, in tick
>s, ssl_env = self.ssl_adapter.wrap(s)
> File
> "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py",
>
I'm wondering if it's possible to enable compression on existing RGW buckets?
The cluster is running Luminous 12.2.12 with FileStore as the backend (no
BlueStore compression then).
We have a cluster that recently started to rapidly fill up with compressible
content (qcow2 images) and I would
ile" OSDs
* check if everything is ok look ing their logs
* taking off the NOUP flag
* Take a coffee and wait till all data are drain
[]'s
Arthur (aKa Guilherme Geronimo)
On 04/09/2019 15:32, Bryan Stillwell wrote:
We are not using jumbo frames anywhere on this cluster (all mtu 1500)
Our test cluster is seeing a problem where peering is going incredibly slow
shortly after upgrading it to Nautilus (14.2.2) from Luminous (12.2.12).
>From what I can tell it seems to be caused by "wait for new map" taking a long
>time. When looking at dump_historic_slow_ops on pretty much any
40 matches
Mail list logo