Re: [ceph-users] Ceph Down on Cluster

2016-11-19 Thread Bruno Silva
So, I did it now. And removed another one. ceph health detail HEALTH_WARN 1 pgs down; 6 pgs incomplete; 6 pgs stuck inactive; 6 pgs stuck unclean; 3 requests are blocked > 32 sec; 2 osds have slow requests pg 0.3 is stuck inactive for 249715.738300, current state incomplete, last acting [1,4,6] pg

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-19 Thread Lionel Bouton
Le 19/11/2016 à 00:52, Brian :: a écrit : > This is like your mother telling not to cross the road when you were 4 > years of age but not telling you it was because you could be flattened > by a car :) > > Can you expand on your answer? If you are in a DC with AB power, > redundant UPS, dual feed f

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-19 Thread Brian ::
HI Lionel, Mega Ouch - I've recently seen the act of measuring power consumption in a data centre (they clamp a probe onto the cable for an AMP reading seemingly) take out a cabinet which had *redundant* power feeds - so anything is possible I guess. Regards Brian On Sat, Nov 19, 2016 at 11:20

[ceph-users] Remove - down_osds_we_would_probe

2016-11-19 Thread Bruno Silva
Version: Hammer On my cluster a pg is saying: "down_osds_we_would_probe": [ 5 ], But this osd was removed. How can i solve this. Reading on group list ceph-users they say that this could be the reason to my cluster is stoped. How can i solve this? _

Re: [ceph-users] Remove - down_osds_we_would_probe

2016-11-19 Thread Paweł Sadowski
Hi, Make a temporary OSD with the same ID and weight 0 to avoid putting data on it. Cluster should contact this OSD and move forward. If not you can also use 'ceph osd lost ID' but OSD with that ID must exists in crushmap (and this probably not the case here). On 19.11.2016 13:46, Bruno Silv

Re: [ceph-users] Remove - down_osds_we_would_probe

2016-11-19 Thread Bruno Silva
I put and didn't works, in the end i put an osd with id 5 in production. Em sáb, 19 de nov de 2016 às 17:46, Paweł Sadowski escreveu: > Hi, > > Make a temporary OSD with the same ID and weight 0 to avoid putting data > on it. Cluster should contact this OSD and move forward. If not you can > al

Re: [ceph-users] Remove - down_osds_we_would_probe

2016-11-19 Thread Bruno Silva
And finally works. Thanks. Now i need to see another erros. My Cluster is very problematic. Em sáb, 19 de nov de 2016 às 19:12, Bruno Silva escreveu: > I put and didn't works, in the end i put an osd with id 5 in production. > > > Em sáb, 19 de nov de 2016 às 17:46, Paweł Sadowski > escreveu: >

[ceph-users] Ceph - access rdb lock out

2016-11-19 Thread Bruno Silva
I don't know what can i do to solve this. I try force create pg. I try deactivate osd. I add new disks. And nothing do this scenario change. # ceph health detail HEALTH_WARN 1 pgs down; 6 pgs incomplete; 6 pgs stuck inactive; 70 pgs stuck unclean; 7 requests are blocked > 32 sec; 3 osds have slow

[ceph-users] PG Down+Incomplete but wihtout block

2016-11-19 Thread Bruno Silva
I have a lot of stuck and down+incomplete and incomplete, but on pg query doesn't show where is the fail ceph health detail HEALTH_WARN clock skew detected on mon.3; 3 pgs down; 6 pgs incomplete; 6 pgs stuck inactive; 6 pgs stuck unclean; 17 requests are blocked > 32 sec; 3 osds have slow requests

[ceph-users] RBD lost parents after rados cppool

2016-11-19 Thread Craig Chi
Hi Cephers, I am tuning the pg numbers of my OpenStack pools. As everyone knows, the pg number of a pool can not be decreased, so I came up with an idea to copy my pools to new pools with lower pg_num and then delete the original pool. I execute following commands: rados cppool volumes new-vo