Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W
Hi Bryan, Try both the commands timing out again but with the -verbose flag, see if we can get anything from that. Tom From: Bryan Banister Sent: 17 July 2018 23:51 To: Tom W ; ceph-users@lists.ceph.com Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W
going on. It might be best to limit the impact of these from going 0 to 100 with these parameters (1 backfill at a time, wait 0.1 seconds between recovery op per OSD). ceph tell osd.* injectargs '--osd-max-backfills 1' ceph tell osd.* injectargs '--osd-recovery-sleep 0.1' Tom

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W
raverse the public and cluster networks successfully? Tom From: Bryan Banister Sent: 17 July 2018 22:36 To: Tom W ; ceph-users@lists.ceph.com Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again Hi Tom, I tried to check out the

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W
already done the usual tests to ensure it is traversing the right interface, correct VLANs, reachable via ICMP, perhaps even run an iperf and tpcdump to be certain the flow is as expected. Tom From: Bryan Banister Sent: 17 July 2018 22:03 To: Tom W ; ceph-users@lists.ceph.com Subject: RE: Cluster

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W
link? Just spitballing some ideas here until somebody more qualified may have an idea. From: Bryan Banister Sent: 17 July 2018 19:18:15 To: Bryan Banister; Tom W; ceph-users@lists.ceph.com Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W
Hi Bryan, What version of Ceph are you currently running on, and do you run any erasure coded pools or bluestore OSDs? Might be worth having a quick glance over the recent changelogs: http://docs.ceph.com/docs/master/releases/luminous/ Tom From: ceph-users

[ceph-users] Centralised Logging Strategy

2018-06-27 Thread Tom W
Morning all, Does anybody have any advice regarding moving their Ceph clusters to centralised logging? We are presently investigating routes to undertake this (long awaited and needed) change, and any pointers or gotchas that may lay ahead that we could be advised on would be great. We are quit

[ceph-users] RGW Index rapidly expanding post tunables update (12.2.5)

2018-06-20 Thread Tom W
Hi all, We have recently upgraded from Jewel (10.2.10) to Luminous (12.2.5) and after this we decided to update our tunables configuration to the optimals, which were previously at Firefly. During this process, we have noticed the OSDs (bluestore) rapidly filling on the RGW index and GC pool. W

Re: [ceph-users] Bucket reporting content inconsistently

2018-05-12 Thread Tom W
Thanks for posting this for me Sean. Just to update, it seems that despite the bucket checks completing and reporting no issues, the objects continued to show in any tools to list the contents of the bucket. I put together a simple loop to upload a new file to overwrite the existing one then tr

[ceph-users] Test for Leo

2018-05-11 Thread Tom W
Test for Leo, please ignore. NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or