[ceph-users] Re: per rbd performance counters

2020-05-07 Thread Marc Roos
Something like: rbd perf image iostat -Original Message- From: Void Star Nill [mailto:void.star.n...@gmail.com] Sent: 07 May 2020 04:18 To: ceph-users Subject: [ceph-users] per rbd performance counters Hello, Is there a way to get read/write I/O statistics for each rbd device for eac

[ceph-users] Re: cephfs change/migrate default data pool

2020-05-07 Thread Kenneth Waegeman
Someone an idea /experience if this is possible ? :) On 29/04/2020 14:56, Kenneth Waegeman wrote: Hi all, I read in some release notes it is recommended to have your default data pool replicated and use erasure coded pools as additional pools through layouts. We have still a cephfs with +-1PB

[ceph-users] Rados clone_range

2020-05-07 Thread Ali Turan
Hello, I saw there was a clone_range function in librados earlier. But it removed in version 12 I beleive. I need excatly that function to avoid unnecessary network traffic. I need to combine many small objects into one. So clone range would be really useful for me. I can read from an object

[ceph-users] Re: Data loss by adding 2OSD causing Long heartbeat ping times

2020-05-07 Thread Martin Verges
Hello XuYun, In my experience, I would always disable swap, it won't do any good. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. r

[ceph-users] Re: cephfs change/migrate default data pool

2020-05-07 Thread Frank Schilder
Hi Kenneth, I did a migration from 2 to 3 pool layout recently. The only way to do this within ceph at the moment seems to be to create a second ceph fs with the new layout, rsync everything over and delete the old ceph fs. I had enough spare capacity and room for the extra PGs to do that. Thi

[ceph-users] Re: How many MDS servers

2020-05-07 Thread Yan, Zheng
On Thu, May 7, 2020 at 1:27 AM Patrick Donnelly wrote: > > Hello Robert, > > On Mon, Mar 9, 2020 at 7:55 PM Robert Ruge wrote: > > For a 1.1PB raw cephfs system currently storing 191TB of data and 390 > > million objects (mostly small Python, ML training files etc.) how many MDS > > servers sho

[ceph-users] Re: Question about bucket versions

2020-05-07 Thread Casey Bodley
On Wed, May 6, 2020 at 3:29 AM Katarzyna Myrek wrote: > > Hi > > I have a few questions about bucket versioning. > > in the output of command "*radosgw-admin bucket stats --bucket=XXX"* there > is info about versions: > > "ver": "0#521391,1#516042,2#518098,3#517681,4#518423", > "master_ver

[ceph-users] Re: Data loss by adding 2OSD causing Long heartbeat ping times

2020-05-07 Thread XuYun
We had got some ping back/front problems after upgrading from filestore to bluestore. It turned out to be related to insufficient memory/swap usage. > 2020年5月6日 下午10:08,Frank Schilder 写道: > > To answer some of my own questions: > > 1) Setting > > ceph osd set noout > ceph osd set nodown > c

[ceph-users] Re: Data loss by adding 2OSD causing Long heartbeat ping times

2020-05-07 Thread Frank Schilder
Hi XuYun and Martin, I checked that already. The OSDs in question have 8GB memory limit and the RAM of the servers is about 50% used. It could be memory fragmentation, which used to be a problem before bitmap allocator. However, my OSDs are configured to use bitmap, at least that is what they c

[ceph-users] Re: rados buckets copy

2020-05-07 Thread Andrei Mikhailovsky
Thanks for the suggestion! - Original Message - > From: "Szabo, Istvan (Agoda)" > To: "Andrei Mikhailovsky" , "ceph-users" > > Sent: Thursday, 7 May, 2020 03:48:04 > Subject: RE: rados buckets copy > Hi, > > You might try s3 browser app, it is quite easy to navigate and copy between >

[ceph-users] Re: 4.14 kernel or greater recommendation for multiple active MDS

2020-05-07 Thread Robert LeBlanc
As a follow up, our MDS was locking up under load so I went ahead and tried it. It seemed that some directories were getting bounced around the MDS servers and load would transition from one to the other. Initially my guess was that some of these old clients were sending all requests to one MDS ser

[ceph-users] Re: Cluster blacklists MDS, can't start

2020-05-07 Thread Robert LeBlanc
On Wed, May 6, 2020 at 2:45 PM Patrick Donnelly wrote: > On Wed, Mar 11, 2020 at 10:41 PM Robert LeBlanc > wrote: > > > > This is the second time this happened in a couple of weeks. The MDS locks > > up and the stand-by can't take over so the Montiors black list them. I > try > > to unblack list

[ceph-users] Re: How many MDS servers

2020-05-07 Thread Robert LeBlanc
On Thu, May 7, 2020 at 6:22 AM Yan, Zheng wrote: > On Thu, May 7, 2020 at 1:27 AM Patrick Donnelly > wrote: > > > > Hello Robert, > > > > On Mon, Mar 9, 2020 at 7:55 PM Robert Ruge > wrote: > > > For a 1.1PB raw cephfs system currently storing 191TB of data and 390 > million objects (mostly sma

[ceph-users] Re: How many MDS servers

2020-05-07 Thread Robert LeBlanc
On Thu, May 7, 2020 at 9:41 AM Robert LeBlanc wrote: > On Thu, May 7, 2020 at 6:22 AM Yan, Zheng wrote: > >> On Thu, May 7, 2020 at 1:27 AM Patrick Donnelly >> wrote: >> > >> > Hello Robert, >> > >> > On Mon, Mar 9, 2020 at 7:55 PM Robert Ruge >> wrote: >> > > For a 1.1PB raw cephfs system cur

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Erwin Lubbers
Hi, Did anyone found a way to resolve the problem? I'm seeing the same on a clean Octopus Ceph installation on Ubuntu 18 with an Octopus compiled KVM server running on CentOS 7.8. The KVM machine shows: [ 7682.233684] fn-radosclient[6060]: segfault at 2b19 ip 7f8165cc0a50 sp 7f81397f64

[ceph-users] Re: [External Email] Re: Bluestore - How to review config?

2020-05-07 Thread Dave Hall
Igor, Thank you for pointing this section out to me.  The information there is useful, but I would comment for John Dover's interest that the section was hard to find.  I'd been on that page several times before, but became overwhelmed by the time I got that far down.  Maybe the 'What do I ha

[ceph-users] Re: [External Email] Re: Re: Bluestore - How to review config?

2020-05-07 Thread Dave Hall
Lin, Igor, Herve, With the help of the information all of you have provided I have now reviewed the massive amount of information that is available for just one of my OSDs.  (All 24 were configured with Ceph-Ansible, so all should be configured the same.) Like Lin's example below, I see that

[ceph-users] Re: ceph-mgr high CPU utilization

2020-05-07 Thread Andras Pataki
Hi everyone, After some investigation, it looks like on our large cluster, ceph-mgr is not able to keep up with the status updates from about 3500 OSDs.  By default OSDs send updates to ceph-mgr every 5 seconds, which, in our case, turns to about 700 messages/s to ceph-mgr.  It looks from gdb

[ceph-users] Migrating clusters (and versions)

2020-05-07 Thread Kees Meijs
Hi list, I'm in the middle of an OpenStack migration (obviously Ceph backed) and stumble into some huge virtual machines. To ensure downtime is kept to a minimum, I'm thinking of using Ceph's snapshot features using rbd export-diff and import-diff. However, is it safe (or even supported) to do t

[ceph-users] Re: How to apply ceph.conf changes using new tool cephadm

2020-05-07 Thread Anthony D'Atri
ceph.conf still *works*, though, for those with existing systems built to manage it? While the centralized config is very handy, I’d love to find a way to tie it into external revision control, so that settings could continue to be managed in eg. git and applied by automation. > On Apr 30, 202

[ceph-users] Re: cephfs change/migrate default data pool

2020-05-07 Thread Patrick Donnelly
On Wed, Apr 29, 2020 at 5:56 AM Kenneth Waegeman wrote: > I read in some release notes it is recommended to have your default data > pool replicated and use erasure coded pools as additional pools through > layouts. We have still a cephfs with +-1PB usage with a EC default pool. > Is there a way t

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Brad Hubbard
On Fri, May 8, 2020 at 3:42 AM Erwin Lubbers wrote: > > Hi, > > Did anyone found a way to resolve the problem? I'm seeing the same on a clean > Octopus Ceph installation on Ubuntu 18 with an Octopus compiled KVM server > running on CentOS 7.8. The KVM machine shows: > > [ 7682.233684] fn-radoscl

[ceph-users] Re: ceph-mgr high CPU utilization

2020-05-07 Thread Brad Hubbard
Could you create a tracker for this and attach an osdmap as well as some recent balancer output (perhaps at a higher debug level if possible)? There are some improvements awaiting backport to nautilus for the C++/python interface just FYI [0] You might also look at gathering output using somethin

[ceph-users] ceph octopus OSDs won't start with docker

2020-05-07 Thread Sean Johnson
I have a seemingly strange situation. I have three OSDs that I created with Ceph Octopus using the `ceph orch daemon add :device` command. All three were added and everything was great. Then I rebooted the host. Now the daemon’s won’t start via Docker. When I attempt to run the `docker` command

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Lomayani S. Laizer
Hello, On my side at point of vm crash these are logs below. At the moment my debug is at 10 value. I will rise to 20 for full debug. these crashes are random and so far happens on very busy vms. Downgrading clients in host to Nautilus these crashes disappear Qemu is not shutting down in general b

[ceph-users] Re: virtual machines crashes after upgrade to octopus

2020-05-07 Thread Brad Hubbard
On Fri, May 8, 2020 at 12:10 PM Lomayani S. Laizer wrote: > > Hello, > On my side at point of vm crash these are logs below. At the moment my debug > is at 10 value. I will rise to 20 for full debug. these crashes are random > and so far happens on very busy vms. Downgrading clients in host to N

[ceph-users] Re: ceph-mgr high CPU utilization

2020-05-07 Thread Andras Pataki
Here it is: https://tracker.ceph.com/issues/45439 The balancer problems we saw might be a consequence of the mgr falling behind, since even after disabling the balancer, the mgr stays overloaded.  I ended up removing all upmaps and leaving the balancer disabled since it further increases the os