[ceph-users] Broken rgw user

2018-04-25 Thread Simon Ironside
Hi Everyone, I've got a problem with one rgw user on Hammer 0.94.7. * "radosgw-admin user info" no longer works: could not fetch user info: no user info saved * I can still retrieve their stats via "radosgw-admin user stats", although the returned data is wrong: { "stats": { "to

Re: [ceph-users] Poor read performance.

2018-04-25 Thread David C
How does your rados bench look? Have you tried playing around with read ahead and striping? On Tue, 24 Apr 2018 17:53 Jonathan Proulx, wrote: > Hi All, > > I seem to be seeing consitently poor read performance on my cluster > relative to both write performance and read perormance of a single >

Re: [ceph-users] Poor read performance.

2018-04-25 Thread Christian Balzer
Hello, On Tue, 24 Apr 2018 12:52:55 -0400 Jonathan Proulx wrote: > Hi All, > > I seem to be seeing consitently poor read performance on my cluster > relative to both write performance and read perormance of a single > backend disk, by quite a lot. > > cluster is luminous with 174 7.2k SAS driv

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-25 Thread Ranjan Ghosh
Thanks a lot for your detailed answer. The problem for us, however, was that we use the Ceph packages that come with the Ubuntu distribution. If you do a Ubuntu upgrade, all packages are upgraded in one go and the server is rebooted. You cannot influence anything or start/stop services one-by-o

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-25 Thread Mark Schouten
On Wed, 2018-04-25 at 11:52 +0200, Ranjan Ghosh wrote: > Thanks a lot for your detailed answer. The problem for us, however, > was  > that we use the Ceph packages that come with the Ubuntu distribution. > If  > you do a Ubuntu upgrade, all packages are upgraded in one go and the  > server is reboo

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-25 Thread Simon Ironside
On 25/04/18 10:52, Ranjan Ghosh wrote: And, yes, we're running a "size:2 min_size:1" because we're on a very tight budget. If I understand correctly, this means: Make changes of files to one server. *Eventually* copy them to the other server. I hope this *eventually* means after a few minutes.

[ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Marc Schöchlin
Hello list, we are trying to integrate a storage repository in xenserver. (i also describe the problem as a issue in the ceph bugtracker: https://tracker.ceph.com/issues/23853) Summary: The slowness is a real pain for us, because this prevents the xen storage repository to work efficently. Gathe

Re: [ceph-users] Ceph 12.2.4 MGR spams syslog with "mon failed to return metadata for mds"

2018-04-25 Thread Charles Alva
Hi John, The "ceph mds metadata mds1" produced "Error ENOENT:". Querying mds metadata to mds2 and mds3 worked as expected. It seemed, only the active MDS could not be queried by Ceph MGR. I also stated wrong that Ceph MGR spamming the syslog, it should be the ceph-mgr log itself, sorry for the co

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Piotr Dałek
On 18-04-25 02:29 PM, Marc Schöchlin wrote: Hello list, we are trying to integrate a storage repository in xenserver. (i also describe the problem as a issue in the ceph bugtracker: https://tracker.ceph.com/issues/23853) Summary: The slowness is a real pain for us, because this prevents the xe

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Marc Schöchlin
Hello Piotr, i updated the issue. (https://tracker.ceph.com/issues/23853?next_issue_id=23852&prev_issue_id=23854) # time rbd ls -l --pool RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd_concurrent_management_ops=1 NAME    SIZE PARENT    RBD-feb32

[ceph-users] trimming the MON level db

2018-04-25 Thread Luis Periquito
Hi all, we have a (really) big cluster that's ongoing a very bad move and the monitor database is growing at an alarming rate. The cluster is running jewel (10.2.7) and is there any way to trim the monitor database before it gets HEALTH_OK? I've searched and so far only found people saying not r

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Jason Dillaman
I'd check your latency between your client and your cluster. On my development machine w/ only a single OSD running and 200 clones, each with 1 snapshot, "rbd -l" only takes a couple seconds for me: $ time rbd ls -l --rbd_concurrent_management_ops=1 | wc -l 403 real 0m1.746s user 0m1.136s sys 0m0

Re: [ceph-users] Blocked Requests

2018-04-25 Thread Shantur Rathore
Hi all, So using ceph-ansible, i built the below mentioned cluster with 2 OSD Nodes and 3 Mons Just after creating osds i started benchmarking the performance using "rbd bench" and "rados bench" and started seeing the performance drop. Checking the status shows slow requests. [root@storage-28-1

[ceph-users] ceph osd reweight (doing -1 or actually -0.0001)

2018-04-25 Thread Marc Roos
Is there some logic behind why ceph is doing this -1, or is this some coding error? 0.8 gives 0.7, and 0.80001 gives 0.8 (ceph 12.2.4) [@~]# ceph osd reweight 11 0.8 reweighted osd.11 to 0.8 () [@~]# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PG

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Marc Schöchlin
Hello Jason, according to this, latency between client and osd should not be the problem: (the high amount of user time in the measure above, network communication should not be the problem) Finding the involved osd: # ceph osd map RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c rbd_director

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Jason Dillaman
Since I cannot reproduce your issue, can you generate a perf CPU flame graph on this to figure out where the user time is being spent? On Wed, Apr 25, 2018 at 11:25 AM, Marc Schöchlin wrote: > Hello Jason, > > according to this, latency between client and osd should not be the problem: > (the hig

[ceph-users] cluster can't remapped objects after change crush tree

2018-04-25 Thread Igor Gajsin
Hi, I've got stuck in a problem with crush rule. I have a small cluster with 3 nodes and 4 osd. I've decided to split it to 2 failure domains and made 2 buckets and put hosts in that buckets like in that instruction http://www.sebastien-han.fr/blog/2014/01/13/ceph-managing-crush-with-the-cli/ Fina

Re: [ceph-users] ceph osd reweight (doing -1 or actually -0.0001)

2018-04-25 Thread Marc Roos
Makes me also wonder what is actually being used by ceph? And thus which one is wrong 'ceph osd reweight' output or 'ceph osd df' output. -Original Message- From: Marc Roos Sent: woensdag 25 april 2018 11:58 To: ceph-users Subject: [ceph-users] ceph osd reweight (doing -1 or actually

[ceph-users] Is RDMA Worth Exploring? Howto ?

2018-04-25 Thread Paul Kunicki
I have a working Luminous 12.2.4 cluster CentOS 7.4 connected via 10G and Mellanox Connect X-3 QDR IB and would like to know if there are any worthwhile gains to be had from enabling RDMA and if there are any good up to date docs on how to do so? Thanks. - - *Paul Kunicki* -

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-25 Thread Ronny Aasen
the difference in cost between 2 and 3 servers are not HUGE. but the reliability  difference between a size 2/1 pool and a 3/2 pool is massive. a 2/1 pool is just a single fault during maintenance away from dataloss.  but you need multiple simultaneous faults, and have very bad luck to break a

Re: [ceph-users] ceph osd reweight (doing -1 or actually -0.0001)

2018-04-25 Thread Paul Emmerich
Hi, the reweight is internally a number between 0 and 0x1 for the range 0 to 1. 0.8 is not representable in this number system. Having an actual floating point number in there would be annoying because CRUSH needs to be 100% deterministic on all clients (also, no floating point in the kernel)

[ceph-users] Backup LUKS/Dmcrypt keys

2018-04-25 Thread Kevin Olbrich
Hi, how can I backup the dmcrypt keys on luminous? The folder under /etc/ceph does not exist anymore. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Poor read performance.

2018-04-25 Thread Jonathan Proulx
On Wed Apr 25 02:24:19 PDT 2018 Christian Balzer wrote: > Hello, > On Tue, 24 Apr 2018 12:52:55 -0400 Jonathan Proulx wrote: > > The performence I really care about is over rbd for VMs in my > > OpenStack but 'rbd bench' seems to line up frety well with 'fio' test > > inside VMs so a more or les

Re: [ceph-users] Poor read performance.

2018-04-25 Thread Blair Bethwaite
Hi Jon, On 25 April 2018 at 21:20, Jonathan Proulx wrote: > > here's a snap of 24hr graph form one server (others are similar in > general shape): > > https://snapshot.raintank.io/dashboard/snapshot/gB3FDPl7uRGWmL17NHNBCuWKGsXdiqlt That's what, a median IOPs of about 80? Pretty high for spinning

Re: [ceph-users] Poor read performance.

2018-04-25 Thread Christian Balzer
Hello, On Wed, 25 Apr 2018 17:20:55 -0400 Jonathan Proulx wrote: > On Wed Apr 25 02:24:19 PDT 2018 Christian Balzer wrote: > > > Hello, > > > On Tue, 24 Apr 2018 12:52:55 -0400 Jonathan Proulx wrote: > > > > The performence I really care about is over rbd for VMs in my > > > OpenStack but

Re: [ceph-users] cluster can't remapped objects after change crush tree

2018-04-25 Thread Konstantin Shalygin
# ceph osd crush tree ID CLASS WEIGHT TYPE NAME -1 3.63835 root default -9 0.90959 pod group1 -5 0.90959 host feather1 1 hdd 0.90959 osd.1 -10 2.72876 pod group2 -7 1.81918 host ds1 2 hdd 0.90959 osd.

[ceph-users] Ceph Developer Monthly - May 2018

2018-04-25 Thread Leonardo Vaz
Hey Cephers, This is just a friendly reminder that the next Ceph Developer Monthly meeting is coming up: http://wiki.ceph.com/Planning If you have work that you're doing that it a feature work, significant backports, or anything you would like to discuss with the core team, please add it to the