Re: [ceph-users] Power outages!!! help!

2017-09-15 Thread hjcho616
Looking better... working on scrubbing..HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs incomplete; 12 pgs inconsistent; 2 pgs repair; 1 pgs stuck inactive; 1 pgs stuck unclean; 109 scrub errors; too few PGs per OSD (29 < min 30); mds rank 0 has failed; mds cluster is degrad

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-15 Thread Josh Durgin
(Sorry for top posting, this email client isn't great at editing) The mitigation strategy I mentioned before of forcing backfill could be backported to jewel, but I don't think it's a very good option for RBD users without SSDs. In luminous there is a command (something like 'ceph pg force-re

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread Vasu Kulkarni
On Fri, Sep 15, 2017 at 3:49 PM, Gregory Farnum wrote: > On Fri, Sep 15, 2017 at 3:34 PM David Turner wrote: >> >> I don't understand a single use case where I want updating my packages >> using yum, apt, etc to restart a ceph daemon. ESPECIALLY when there are so >> many clusters out there with

Re: [ceph-users] RBD: How many snapshots is too many?

2017-09-15 Thread Gregory Farnum
On Mon, Sep 11, 2017 at 1:10 PM Florian Haas wrote: > On Mon, Sep 11, 2017 at 8:27 PM, Mclean, Patrick > wrote: > > > > On 2017-09-08 06:06 PM, Gregory Farnum wrote: > > > On Fri, Sep 8, 2017 at 5:47 PM, Mclean, Patrick < > patrick.mcl...@sony.com> wrote: > > > > > >> On a related note, we are v

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David Turner
I'm sorry for getting a little hot there. You're definitely right that you can't please everyone with a forced choice. It's unfortunate that it can so drastically impact an upgrade like it did here. Is there a way to configure yum or apt to make sure that it won't restart these (or guarantee tha

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread Gregory Farnum
On Fri, Sep 15, 2017 at 3:34 PM David Turner wrote: > I don't understand a single use case where I want updating my packages > using yum, apt, etc to restart a ceph daemon. ESPECIALLY when there are so > many clusters out there with multiple types of daemons running on the same > server. > > My

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-15 Thread Florian Haas
On Fri, Sep 15, 2017 at 10:37 PM, Josh Durgin wrote: >> So this affects just writes. Then I'm really not following the >> reasoning behind the current behavior. Why would you want to wait for >> the recovery of an object that you're about to clobber anyway? Naïvely >> thinking an object like that

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David Turner
I don't understand a single use case where I want updating my packages using yum, apt, etc to restart a ceph daemon. ESPECIALLY when there are so many clusters out there with multiple types of daemons running on the same server. My home setup is 3 nodes each running 3 OSDs, a MON, and an MDS serv

Re: [ceph-users] OSD memory usage

2017-09-15 Thread Christian Wuerdig
Assuming you're using Bluestore you could experiments with the cache settings (http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/) In your case setting bluestore_cache_size_hdd lower than the default 1GB might help with the RAM usage various people have reported solving O

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread Vasu Kulkarni
On Fri, Sep 15, 2017 at 2:10 PM, David Turner wrote: > I'm glad that worked for you to finish the upgrade. > > He has multiple MONs, but all of them are on nodes with OSDs as well. When > he updated the packages on the first node, it restarted the MON and all of > the OSDs. This is strictly not

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David Turner
I'm glad that worked for you to finish the upgrade. He has multiple MONs, but all of them are on nodes with OSDs as well. When he updated the packages on the first node, it restarted the MON and all of the OSDs. This is strictly not supported in the Luminous upgrade as the OSDs can't be running

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread Vasu Kulkarni
On Fri, Sep 15, 2017 at 1:48 PM, David wrote: > Happy to report I got everything up to Luminous, used your tip to keep the > OSDs running, David, thanks again for that. > > I'd say this is a potential gotcha for people collocating MONs. It appears > that if you're running selinux, even in permissi

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David
Happy to report I got everything up to Luminous, used your tip to keep the OSDs running, David, thanks again for that. I'd say this is a potential gotcha for people collocating MONs. It appears that if you're running selinux, even in permissive mode, upgrading the ceph-selinux packages forces a re

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-15 Thread Josh Durgin
On 09/15/2017 01:57 AM, Florian Haas wrote: On Fri, Sep 15, 2017 at 8:58 AM, Josh Durgin wrote: This is more of an issue with write-intensive RGW buckets, since the bucket index object is a single bottleneck if it needs recovery, and all further writes to a shard of a bucket index will be block

Re: [ceph-users] Some OSDs are down after Server reboot

2017-09-15 Thread Joe Comeau
We're running journals on NVMe as well - SLES before rebooting try deleting the links here: /etc/systemd/system/ceph-osd.target.wants/ if we delete first it boots ok if we don't delete the disks sometimes don't come up and we have to ceph-disk activate all HTH Thanks Joe >>> David Turne

[ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-15 Thread Lazuardi Nasution
Hi, 1. Is it possible configure use osd_data not as small partition on OSD but a folder (ex. on root disk)? If yes, how to do that with ceph-disk and any pros/cons of doing that? 2. Is WAL & DB size calculated based on OSD size or expected throughput like on journal device of filestore? If no, wha

[ceph-users] mon health status gone from display

2017-09-15 Thread Alex Gorbachev
In Jewel and prior there was a health status for MONs in ceph -s JSON output, this seems to be gone now. Is there a place where a status of a given monitor is shown in Luminous? Thank you -- Alex Gorbachev Storcium ___ ceph-users mailing list ceph-users

Re: [ceph-users] Some OSDs are down after Server reboot

2017-09-15 Thread David Turner
I have this issue with my NVMe OSDs, but not my HDD OSDs. I have 15 HDD's and 2 NVMe's in each host. We put most of the journals on one of the NVMe's and a few on the second, but added a small OSD partition to the second NVMe for RGW metadata pools. When restarting a server manually for testing,

Re: [ceph-users] Mixed versions of cluster and clients

2017-09-15 Thread Mike A
> 15 сент. 2017 г., в 18:42, Sage Weil написал(а): > > On Fri, 15 Sep 2017, Mike A wrote: >> Hello! >> >> We have a ceph cluster based on Jewel release and one virtualization >> infrastructure that is using the cluster. Now we are going to add another >> ceph cluster but based on luminous wit

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-15 Thread David
Hi David I like your thinking! Thanks for the suggestion. I've got a maintenance window later to finish the update so will give it a try. On Thu, Sep 14, 2017 at 6:24 PM, David Turner wrote: > This isn't a great solution, but something you could try. If you stop all > of the daemons via syste

Re: [ceph-users] Some OSDs are down after Server reboot

2017-09-15 Thread Matthew Vernon
Hi, On 14/09/17 16:26, Götz Reinicke wrote: > maybe someone has a hint: I do have a cephalopod cluster (6 nodes, 144 > OSDs), Cents 7.3 ceph 10.2.7. > > I did a kernel update to the recent centos 7.3 one on a node and did a > reboot. > > After that, 10 OSDs did not came up as the others. The di

Re: [ceph-users] Mixed versions of cluster and clients

2017-09-15 Thread Sage Weil
On Fri, 15 Sep 2017, Mike A wrote: > Hello! > > We have a ceph cluster based on Jewel release and one virtualization > infrastructure that is using the cluster. Now we are going to add another > ceph cluster but based on luminous with bluestore. > The virtualization infrastructure must use thes

[ceph-users] Mixed versions of cluster and clients

2017-09-15 Thread Mike A
Hello! We have a ceph cluster based on Jewel release and one virtualization infrastructure that is using the cluster. Now we are going to add another ceph cluster but based on luminous with bluestore. The virtualization infrastructure must use these ceph clusters. Do I need to update software

Re: [ceph-users] Power outages!!! help!

2017-09-15 Thread hjcho616
After running ceph osd lost osd.0, it started backfilling... I figured that was supposed to happen earlier when I added those missing PGs.  Running in to "too few PGs per OSD" I removed osds after cluster stopped working after adding osds.  But I guess I still needed them.  Currently I see sever

Re: [ceph-users] Power outages!!! help!

2017-09-15 Thread Ronny Aasen
you write you had all pg's exported except one. so i assume you have injected those pg's into the cluster again using the method linked a few times in this thread. How did that go, were you successfull in recovering those pg's ? kind regards. Ronny Aasen On 15. sep. 2017 07:52, hjcho616 w

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-15 Thread Florian Haas
On Fri, Sep 15, 2017 at 8:58 AM, Josh Durgin wrote: >> OK, maybe the "also" can be removed to reduce potential confusion? > > > Sure That'd be great. :) >> - We have a bunch of objects that need to be recovered onto the >> just-returned OSD(s). >> - Clients access some of these objects while the

[ceph-users] s3cmd not working with luminous radosgw

2017-09-15 Thread Yoann Moulin
Hello, I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 1.5M files) with rclone, I'm able to list/copy files with rclone but s3cmd does not work at all, it is just able to give the bucket list but I can't list files neither update ACL. does anyone already test this ?

[ceph-users] 'flags' of PG.

2017-09-15 Thread dE .
Hi, I was going through health check documentation, where I found references to 'PG flags' like degraded, undersized, backfill_toofull or recovery_toofull etc... I find traces of these flags throughout the documentation, but but