Re: [ceph-users] Help, monitor stuck constantly electing

2016-05-16 Thread kefu chai
On Tue, May 17, 2016 at 1:36 AM, Василий Ангапов wrote: > Hello, > > I have a Ceph cluster (10.2.1) with 10 nodes, 3 mons and 290 OSDs. I > have an instance of RGW with buckets data in EC pool 6+3. > I've recently started testing cluster redundancy level by powering > nodes off one by one. > Sudde

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread Andrus, Brian Contractor
Yes, I use the fuse client because the kernel client isn't happy with selinux settings. I have experienced the same symptoms with both clients, however. Yes, the clients that had nothing were merely mounted and nothing, not even an 'ls' was done on the filesystem. I did do 'df' on some of the cl

Re: [ceph-users] Ceph Recovery

2016-05-16 Thread Gaurav Bafna
Hi Lazuardi No, there are no unfound or incomplete PGs. Replacing the osds surely makes the cluster health. But the problem should not have occurred in the first place. The cluster should have automatically healed after the OSDs were marked out of the cluster . Else this will be a manual process

Re: [ceph-users] Ceph Recovery

2016-05-16 Thread Lazuardi Nasution
Gaurav, Is there any unfound or incomplete PGs? If not, you can remove OSD (with monitoring ceph -w and ceph -s output) and then replace it with good one, one by one OSD. I have done with that successfully. Best regards, On Tue, May 17, 2016 at 12:30 PM, Gaurav Bafna wrote: > Even I faced the

Re: [ceph-users] Ceph Recovery

2016-05-16 Thread Lazuardi Nasution
Hi Wido, The 75% happen on 4 nodes of 24 OSDs each with pool size of two and minimum size of one. Any relation between this configuration and 75%? Best regards, On Tue, May 17, 2016 at 3:38 AM, Wido den Hollander wrote: > > > Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution < > mrxlazuar...@g

Re: [ceph-users] Ceph Recovery

2016-05-16 Thread Gaurav Bafna
Even I faced the same issue with our production cluster . cluster fac04d85-db48-4564-b821-deebda046261 health HEALTH_WARN 658 pgs degraded 658 pgs stuck degraded 688 pgs stuck unclean 658 pgs stuck undersized 658 pgs undersized

Re: [ceph-users] v10.2.1 Jewel released

2016-05-16 Thread Karsten Heymann
Hi Sage, the updated debian packages are *still* missing ceph-{mon,osd}.target. Was it intentional to release the point release without the fix? root@ceph-cap1-01:~# apt-show-versions | grep ^ceph | sort ceph:amd64/jessie 10.2.1-1~bpo80+1 uptodate ceph-base:amd64/jessie 10.2.1-1~bpo80+1 uptodate

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Christian Balzer
Hello, On Tue, 17 May 2016 12:12:02 +1000 Chris Dunlop wrote: > Hi Christian, > > On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote: > > On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > > Most your questions would be easily answered if you did spend a few > > minutes with

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
Hi Christian, On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote: > On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > Most your questions would be easily answered if you did spend a few > minutes with even the crappiest test cluster and observing things (with > atop and the li

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Christian Balzer
Hello, On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote: > > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > > > > pg_num is the actual amount of PGs. This you can increase without any > > > actua

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
On Mon, May 16, 2016 at 10:40:47PM +0200, Wido den Hollander wrote: > > Op 16 mei 2016 om 7:56 schreef Chris Dunlop : > > Why do we have both pg_num and pgp_num? Given the docs say "The pgp_num > > should be equal to the pg_num": under what circumstances might you want > > these different, apart fr

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote: > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > > pg_num is the actual amount of PGs. This you can increase without any > > actual data moving. > > Yes and no. > > Increasing the pg_num will split PGs, w

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Christian Balzer
Hello, On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > Op 16 mei 2016 om 7:56 schreef Chris Dunlop : > > > > > > Hi, > > > > I'm trying to understand the potential impact on an active cluster of > > increasing pg_num/pgp_num. > > > > The conventional wisdom, as gle

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Wido den Hollander
> Op 16 mei 2016 om 7:56 schreef Chris Dunlop : > > > Hi, > > I'm trying to understand the potential impact on an active cluster of > increasing pg_num/pgp_num. > > The conventional wisdom, as gleaned from the mailing lists and general > google fu, seems to be to increase pg_num followed by pg

Re: [ceph-users] Ceph Recovery

2016-05-16 Thread Wido den Hollander
> Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution : > > > Hi Wido, > > Yes you are right. After removing the down OSDs, reformatting and bring > them up again, at least until 75% of total OSDs, my Ceph Cluster is healthy > again. It seem there is high probability of data safety if the total a

Re: [ceph-users] Mounting format 2 rbd images (created in Jewel) on CentOS 7 clients

2016-05-16 Thread Steven Hsiao-Ting Lee
Hi, Thanks for getting back to me. It turns out that setting crush tunables to “optimal” profile caused the problem I encountered. I set it back to “default” and specified “layering” as the only image-feature as you suggested fixed the problem. Thanks again. Steven > On May 13, 2016, at 4:

[ceph-users] Help, monitor stuck constantly electing

2016-05-16 Thread Василий Ангапов
Hello, I have a Ceph cluster (10.2.1) with 10 nodes, 3 mons and 290 OSDs. I have an instance of RGW with buckets data in EC pool 6+3. I've recently started testing cluster redundancy level by powering nodes off one by one. Suddenly I noticed that all monitors became crazy eating 100% CPU, in "perf

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread Dan van der Ster
On 16 May 2016 16:36, "John Spray" wrote: > > On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor > wrote: > > Both client and server are Jewel 10.2.0 > > So the fuse client, correct? If you are up for investigating further, > with potential client bugs (or performance issues) it is often

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread Mark Nelson
FWIW, when we tested CephFS at ORNL a couple of years ago we were doing about 4-6GB/s on relatively non-optimal hardware (pretty much maxing the hardware out on writes, though only about 50-60% on reads). What you are experiencing isn't necessarily reflective of how a healthy cluster will perf

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread John Spray
On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor wrote: > Both client and server are Jewel 10.2.0 So the fuse client, correct? If you are up for investigating further, with potential client bugs (or performance issues) it is often useful to compare the fuse vs. kernel clients (using the

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread Andrus, Brian Contractor
Both client and server are Jewel 10.2.0 "All kinds of issues" include that EVERY node ended up with the cache pressure message, even if they had done no access at all. I ended up with some 200 degraded pgs. Quite a few with other of the 'standard' errors of suck waiting and such. I ended up di

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread Brett Niver
The terminology we're using to describe CephFS in Jewel is "stable" as opposed to production ready. Thanks, Brett On Monday, May 16, 2016, John Spray wrote: > On Mon, May 16, 2016 at 5:42 AM, Andrus, Brian Contractor > > wrote: > > So this ‘production ready’ CephFS for jewel seems a little not

[ceph-users] v10.2.1 Jewel released

2016-05-16 Thread Sage Weil
This is the first bugfix release for Jewel. It contains several annoying packaging and init system fixes and a range of important bugfixes across RBD, RGW, and CephFS. We recommend that all v10.2.x users upgrade. For more detailed information, see the release notes at http://docs.ceph.

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Christian Balzer
Hello, On Mon, 16 May 2016 13:14:29 +0200 Peter Kerdisle wrote: see all the way down. > On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote: > > > > > > > > -Original Message- > > > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > > > Sent: 16 May 2016 11:04 > > > To: Nick Fisk

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Nick Fisk
> -Original Message- > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > Sent: 16 May 2016 12:14 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Erasure pool performance expectations > > > > On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote: > > > >

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Peter Kerdisle
On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote: > > > > -Original Message- > > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > > Sent: 16 May 2016 11:04 > > To: Nick Fisk > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Erasure pool performance expectations > >

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Christian Balzer
Hello, On Mon, 16 May 2016 12:49:48 +0200 Peter Kerdisle wrote: > Thanks yet again Nick for the help and explanations. I will experiment > some more and see if I can get the slow requests further down and > increase the overall performance. > And as I probably mentioned before, if your cache i

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Peter Kerdisle
Thanks yet again Nick for the help and explanations. I will experiment some more and see if I can get the slow requests further down and increase the overall performance. On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote: > > > > -Original Message- > > From: Peter Kerdisle [mailto:peter.

Re: [ceph-users] ceph-mon.target not enabled

2016-05-16 Thread Tim Serong
On 04/20/2016 04:43 AM, Ruben Kerkhof wrote: > Hi all, > > I just installed 3 monitors, using ceph-deploy, on CentOS 7.2. Ceph is 10.1.2. > > My ceph-mon processes do not come up after reboot. This is what ceph-deploy > create-initial did: > > [ams1-ceph01-mon01][INFO ] Running command: sudo s

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Nick Fisk
> -Original Message- > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > Sent: 16 May 2016 11:04 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Erasure pool performance expectations > > > On Mon, May 16, 2016 at 11:58 AM, Nick Fisk wrote: > > -O

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Peter Kerdisle
On Mon, May 16, 2016 at 11:58 AM, Nick Fisk wrote: > > -Original Message- > > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > > Sent: 16 May 2016 10:39 > > To: n...@fisk.me.uk > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Erasure pool performance expectations >

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Nick Fisk
> -Original Message- > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > Sent: 16 May 2016 10:39 > To: n...@fisk.me.uk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Erasure pool performance expectations > > I'm forcing a flush by lower the cache_target_dirty_ratio to a

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Peter Kerdisle
I'm forcing a flush by lower the cache_target_dirty_ratio to a lower value. This forces writes to the EC pool, these are the operations I'm trying to throttle a bit. I am understanding you correctly that's throttling only works for the other way around? Promoting cold objects into the hot cache? T

Re: [ceph-users] How to remove a placement group?

2016-05-16 Thread Romero Junior
Firstly, thanks for the tips! Well, after trying the mark_unfound_lost delete on the pg I got the following output: ceph pg 15.3b3 mark_unfound_lost delete Error EINTR: problem getting command descriptions from pg.15.3b3 Any more ideas? From: Kostis Fardelas [mailto:dante1...@gmail.com] Sent:

Re: [ceph-users] failing to respond to cache pressure

2016-05-16 Thread John Spray
On Mon, May 16, 2016 at 5:42 AM, Andrus, Brian Contractor wrote: > So this ‘production ready’ CephFS for jewel seems a little not quite…. > > > > Currently I have a single system mounting CephFS and merely scp-ing data to > it. > > The CephFS mount has 168 TB used, 345 TB / 514 TB avail. > > > > E

Re: [ceph-users] Help with seemingly damaged MDS rank(?)

2016-05-16 Thread John Spray
On Sun, May 15, 2016 at 3:08 AM, Skaag Argonius wrote: > One of the issues was a different version of ceph on the nodes. They are now > all back to version 9.0.2 and things are looking a bit better. What was the other version? We recently encountered someone who had a mix of MDS versions and th

Re: [ceph-users] Erasure pool performance expectations

2016-05-16 Thread Nick Fisk
> -Original Message- > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com] > Sent: 15 May 2016 08:04 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Erasure pool performance expectations > > Hey Nick, > > I've been playing around with the osd_tier_promote_m

Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on lockfile

2016-05-16 Thread Nick Fisk
> -Original Message- > From: Eric Eastman [mailto:eric.east...@keepertech.com] > Sent: 11 May 2016 16:02 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on > lockfile > > On Wed, May 11, 2016 at 2:04 AM, Nick Fisk wrote: > >> -O

Re: [ceph-users] v0.94.7 Hammer released

2016-05-16 Thread Dan van der Ster
On Mon, May 16, 2016 at 8:20 AM, Chris Dunlop wrote: > On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote: >> This Hammer point release fixes several minor bugs. It also includes a >> backport of an improved ‘ceph osd reweight-by-utilization’ command for >> handling OSDs with higher-than-av

Re: [ceph-users] v0.94.7 Hammer released

2016-05-16 Thread Emmanuel Lacour
Le 16/05/2016 08:20, Chris Dunlop a écrit : > On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote: >> This Hammer point release fixes several minor bugs. It also includes a >> backport of an improved ‘ceph osd reweight-by-utilization’ command for >> handling OSDs with higher-than-average ut