On Tue, May 17, 2016 at 1:36 AM, Василий Ангапов wrote:
> Hello,
>
> I have a Ceph cluster (10.2.1) with 10 nodes, 3 mons and 290 OSDs. I
> have an instance of RGW with buckets data in EC pool 6+3.
> I've recently started testing cluster redundancy level by powering
> nodes off one by one.
> Sudde
Yes, I use the fuse client because the kernel client isn't happy with selinux
settings.
I have experienced the same symptoms with both clients, however.
Yes, the clients that had nothing were merely mounted and nothing, not even an
'ls' was done on the filesystem. I did do 'df' on some of the cl
Hi Lazuardi
No, there are no unfound or incomplete PGs.
Replacing the osds surely makes the cluster health. But the problem
should not have occurred in the first place. The cluster should have
automatically healed after the OSDs were marked out of the cluster .
Else this will be a manual process
Gaurav,
Is there any unfound or incomplete PGs? If not, you can remove OSD (with
monitoring ceph -w and ceph -s output) and then replace it with good one,
one by one OSD. I have done with that successfully.
Best regards,
On Tue, May 17, 2016 at 12:30 PM, Gaurav Bafna wrote:
> Even I faced the
Hi Wido,
The 75% happen on 4 nodes of 24 OSDs each with pool size of two and minimum
size of one. Any relation between this configuration and 75%?
Best regards,
On Tue, May 17, 2016 at 3:38 AM, Wido den Hollander wrote:
>
> > Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution <
> mrxlazuar...@g
Even I faced the same issue with our production cluster .
cluster fac04d85-db48-4564-b821-deebda046261
health HEALTH_WARN
658 pgs degraded
658 pgs stuck degraded
688 pgs stuck unclean
658 pgs stuck undersized
658 pgs undersized
Hi Sage,
the updated debian packages are *still* missing ceph-{mon,osd}.target.
Was it intentional to release the point release without the fix?
root@ceph-cap1-01:~# apt-show-versions | grep ^ceph | sort
ceph:amd64/jessie 10.2.1-1~bpo80+1 uptodate
ceph-base:amd64/jessie 10.2.1-1~bpo80+1 uptodate
Hello,
On Tue, 17 May 2016 12:12:02 +1000 Chris Dunlop wrote:
> Hi Christian,
>
> On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote:
> > On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote:
> > Most your questions would be easily answered if you did spend a few
> > minutes with
Hi Christian,
On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote:
> On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote:
> Most your questions would be easily answered if you did spend a few
> minutes with even the crappiest test cluster and observing things (with
> atop and the li
Hello,
On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote:
> On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote:
> > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote:
> > >
> > > pg_num is the actual amount of PGs. This you can increase without any
> > > actua
On Mon, May 16, 2016 at 10:40:47PM +0200, Wido den Hollander wrote:
> > Op 16 mei 2016 om 7:56 schreef Chris Dunlop :
> > Why do we have both pg_num and pgp_num? Given the docs say "The pgp_num
> > should be equal to the pg_num": under what circumstances might you want
> > these different, apart fr
On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote:
> On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote:
> >
> > pg_num is the actual amount of PGs. This you can increase without any
> > actual data moving.
>
> Yes and no.
>
> Increasing the pg_num will split PGs, w
Hello,
On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote:
>
> > Op 16 mei 2016 om 7:56 schreef Chris Dunlop :
> >
> >
> > Hi,
> >
> > I'm trying to understand the potential impact on an active cluster of
> > increasing pg_num/pgp_num.
> >
> > The conventional wisdom, as gle
> Op 16 mei 2016 om 7:56 schreef Chris Dunlop :
>
>
> Hi,
>
> I'm trying to understand the potential impact on an active cluster of
> increasing pg_num/pgp_num.
>
> The conventional wisdom, as gleaned from the mailing lists and general
> google fu, seems to be to increase pg_num followed by pg
> Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution :
>
>
> Hi Wido,
>
> Yes you are right. After removing the down OSDs, reformatting and bring
> them up again, at least until 75% of total OSDs, my Ceph Cluster is healthy
> again. It seem there is high probability of data safety if the total a
Hi,
Thanks for getting back to me. It turns out that setting crush tunables to
“optimal” profile caused the problem I encountered. I set it back to “default”
and specified “layering” as the only image-feature as you suggested fixed the
problem. Thanks again.
Steven
> On May 13, 2016, at 4:
Hello,
I have a Ceph cluster (10.2.1) with 10 nodes, 3 mons and 290 OSDs. I
have an instance of RGW with buckets data in EC pool 6+3.
I've recently started testing cluster redundancy level by powering
nodes off one by one.
Suddenly I noticed that all monitors became crazy eating 100% CPU, in
"perf
On 16 May 2016 16:36, "John Spray" wrote:
>
> On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor
> wrote:
> > Both client and server are Jewel 10.2.0
>
> So the fuse client, correct? If you are up for investigating further,
> with potential client bugs (or performance issues) it is often
FWIW, when we tested CephFS at ORNL a couple of years ago we were doing
about 4-6GB/s on relatively non-optimal hardware (pretty much maxing the
hardware out on writes, though only about 50-60% on reads). What you
are experiencing isn't necessarily reflective of how a healthy cluster
will perf
On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor
wrote:
> Both client and server are Jewel 10.2.0
So the fuse client, correct? If you are up for investigating further,
with potential client bugs (or performance issues) it is often useful
to compare the fuse vs. kernel clients (using the
Both client and server are Jewel 10.2.0
"All kinds of issues" include that EVERY node ended up with the cache pressure
message, even if they had done no access at all.
I ended up with some 200 degraded pgs. Quite a few with other of the
'standard' errors of suck waiting and such. I ended up di
The terminology we're using to describe CephFS in Jewel is "stable" as
opposed to production ready.
Thanks,
Brett
On Monday, May 16, 2016, John Spray wrote:
> On Mon, May 16, 2016 at 5:42 AM, Andrus, Brian Contractor
> > wrote:
> > So this ‘production ready’ CephFS for jewel seems a little not
This is the first bugfix release for Jewel. It contains several annoying
packaging and init system fixes and a range of important bugfixes across
RBD, RGW, and CephFS.
We recommend that all v10.2.x users upgrade.
For more detailed information, see the release notes at
http://docs.ceph.
Hello,
On Mon, 16 May 2016 13:14:29 +0200 Peter Kerdisle wrote:
see all the way down.
> On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote:
>
> >
> >
> > > -Original Message-
> > > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> > > Sent: 16 May 2016 11:04
> > > To: Nick Fisk
> -Original Message-
> From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> Sent: 16 May 2016 12:14
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Erasure pool performance expectations
>
>
>
> On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote:
>
>
> >
On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote:
>
>
> > -Original Message-
> > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> > Sent: 16 May 2016 11:04
> > To: Nick Fisk
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
> >
Hello,
On Mon, 16 May 2016 12:49:48 +0200 Peter Kerdisle wrote:
> Thanks yet again Nick for the help and explanations. I will experiment
> some more and see if I can get the slow requests further down and
> increase the overall performance.
>
And as I probably mentioned before, if your cache i
Thanks yet again Nick for the help and explanations. I will experiment some
more and see if I can get the slow requests further down and increase the
overall performance.
On Mon, May 16, 2016 at 12:20 PM, Nick Fisk wrote:
>
>
> > -Original Message-
> > From: Peter Kerdisle [mailto:peter.
On 04/20/2016 04:43 AM, Ruben Kerkhof wrote:
> Hi all,
>
> I just installed 3 monitors, using ceph-deploy, on CentOS 7.2. Ceph is 10.1.2.
>
> My ceph-mon processes do not come up after reboot. This is what ceph-deploy
> create-initial did:
>
> [ams1-ceph01-mon01][INFO ] Running command: sudo s
> -Original Message-
> From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> Sent: 16 May 2016 11:04
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Erasure pool performance expectations
>
>
> On Mon, May 16, 2016 at 11:58 AM, Nick Fisk wrote:
> > -O
On Mon, May 16, 2016 at 11:58 AM, Nick Fisk wrote:
> > -Original Message-
> > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> > Sent: 16 May 2016 10:39
> > To: n...@fisk.me.uk
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
>
> -Original Message-
> From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> Sent: 16 May 2016 10:39
> To: n...@fisk.me.uk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Erasure pool performance expectations
>
> I'm forcing a flush by lower the cache_target_dirty_ratio to a
I'm forcing a flush by lower the cache_target_dirty_ratio to a lower value.
This forces writes to the EC pool, these are the operations I'm trying to
throttle a bit. I am understanding you correctly that's throttling only
works for the other way around? Promoting cold objects into the hot cache?
T
Firstly, thanks for the tips!
Well, after trying the mark_unfound_lost delete on the pg I got the following
output:
ceph pg 15.3b3 mark_unfound_lost delete
Error EINTR: problem getting command descriptions from pg.15.3b3
Any more ideas?
From: Kostis Fardelas [mailto:dante1...@gmail.com]
Sent:
On Mon, May 16, 2016 at 5:42 AM, Andrus, Brian Contractor
wrote:
> So this ‘production ready’ CephFS for jewel seems a little not quite….
>
>
>
> Currently I have a single system mounting CephFS and merely scp-ing data to
> it.
>
> The CephFS mount has 168 TB used, 345 TB / 514 TB avail.
>
>
>
> E
On Sun, May 15, 2016 at 3:08 AM, Skaag Argonius wrote:
> One of the issues was a different version of ceph on the nodes. They are now
> all back to version 9.0.2 and things are looking a bit better.
What was the other version? We recently encountered someone who had a
mix of MDS versions and th
> -Original Message-
> From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> Sent: 15 May 2016 08:04
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Erasure pool performance expectations
>
> Hey Nick,
>
> I've been playing around with the osd_tier_promote_m
> -Original Message-
> From: Eric Eastman [mailto:eric.east...@keepertech.com]
> Sent: 11 May 2016 16:02
> To: Nick Fisk
> Cc: Ceph Users
> Subject: Re: [ceph-users] CephFS + CTDB/Samba - MDS session timeout on
> lockfile
>
> On Wed, May 11, 2016 at 2:04 AM, Nick Fisk wrote:
> >> -O
On Mon, May 16, 2016 at 8:20 AM, Chris Dunlop wrote:
> On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote:
>> This Hammer point release fixes several minor bugs. It also includes a
>> backport of an improved ‘ceph osd reweight-by-utilization’ command for
>> handling OSDs with higher-than-av
Le 16/05/2016 08:20, Chris Dunlop a écrit :
> On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote:
>> This Hammer point release fixes several minor bugs. It also includes a
>> backport of an improved ‘ceph osd reweight-by-utilization’ command for
>> handling OSDs with higher-than-average ut
40 matches
Mail list logo