Re: [ceph-users] MDS damaged

2018-07-12 Thread Adam Tygart
I've hit this today with an upgrade to 12.2.6 on my backup cluster. Unfortunately there were issues with the logs (in that the files weren't writable) until after the issue struck. 2018-07-13 00:16:54.437051 7f5a0a672700 -1 log_channel(cluster) log [ERR] : 5.255 full-object read crc 0x4e97b4e !=

Re: [ceph-users] Increase queue_depth in KVM

2018-07-12 Thread Konstantin Shalygin
I've seen some people using 'num_queues' but I don't have this parameter in my schemas(libvirt version = 1.3.1, qemu version = 2.5.0 num-queues is available from qemu 2.7 [1] [1] https://wiki.qemu.org/ChangeLog/2.7 k ___ ceph-users mailing

Re: [ceph-users] OSD tuning no longer required?

2018-07-12 Thread Konstantin Shalygin
I saw this in the Luminous release notes: "Each OSD now adjusts its default configuration based on whether the backing device is an HDD or SSD. Manual tuning generally not required" Which tuning in particular? The ones in my configuration are osd_op_threads, osd_disk_threads,

Re: [ceph-users] mds daemon damaged

2018-07-12 Thread Oliver Freyermuth
Hi, all this sounds an awful lot like: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/027992.html In htat case, things started with an update to 12.2.6. Which version are you running? Cheers, Oliver Am 12.07.2018 um 23:30 schrieb Kevin: > Sorry for the long posting but trying

Re: [ceph-users] mds daemon damaged

2018-07-12 Thread Patrick Donnelly
On Thu, Jul 12, 2018 at 3:55 PM, Patrick Donnelly wrote: >> Recommends fixing error by hand. Tried running deep scrub on pg 2.4, it >> completes but still have the same issue above >> >> Final option is to attempt removing mds.ds27. If mds.ds29 was a standby and >> has data it should become live.

Re: [ceph-users] mds daemon damaged

2018-07-12 Thread Patrick Donnelly
On Thu, Jul 12, 2018 at 2:30 PM, Kevin wrote: > Sorry for the long posting but trying to cover everything > > I woke up to find my cephfs filesystem down. This was in the logs > > 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc 0x6fc2f65a > != expected 0x1c08241c on

[ceph-users] mds daemon damaged

2018-07-12 Thread Kevin
Sorry for the long posting but trying to cover everything I woke up to find my cephfs filesystem down. This was in the logs 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc 0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.:head I had one standby MDS, but as far as

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-12 Thread ceph . novice
There was no change in the ZABBIX environment... I got the this warning some minutes after the Linux and Luminous->Mimic update via YUM and a reboot of all the Ceph servers... Is there anyone, who also had the ZABBIX module unabled under Luminos AND then migrated to Mimic? If yes, does it work

[ceph-users] How are you using tuned

2018-07-12 Thread Mohamad Gebai
Hi all, I was wondering how people were using tuned with Ceph, if at all. I think it makes sense to enable the throuhput-performance profile on OSD nodes, and maybe the network-latency profiles on mon and mgr nodes. Is anyone using a similar configuration, and do you have any thought on this

[ceph-users] Rook Deployments

2018-07-12 Thread Travis Nielsen
Any Rook users out there running Ceph in Kubernetes? We would love to hear about your experiences. Rook is currently hosted by the CNCF in the sandbox stage and we are proposing that Rook graduate to the incubating stage. Part of graduating is

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
Some progress, and more pain... I was able to recover the 200. using the ceph-objectstore-tool for one of the OSDs (all identical copies) but trying to re-inject it just with rados put was giving no error while the get was still giving the same I/O error. So the solution was to rm the

Re: [ceph-users] Increase queue_depth in KVM

2018-07-12 Thread Damian Dabrowski
Hello, Steffen, Thanks for Your reply. Sorry but I was on holidays, now I'm back and still digging into my problem.. :( I've read thousands of google links but can't find anything which could help me. - tried all qemu drive IO(io=) and cache(cache=) modes, nothing could come even close to the

Re: [ceph-users] RADOSGW err=Input/output error

2018-07-12 Thread Drew Weaver
I never actually was able to fix this, I just moved on to something else. I guess I will try 13 and see if maybe the bug has been fixed when it’s released. -Drew From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Will Zhao Sent: Thursday, July 12, 2018 4:32 AM To:

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
Unfortunately yes, all the OSDs were restarted a few times, but no change. Thanks,     Alessandro Il 12/07/18 15:55, Paul Emmerich ha scritto: This might seem like a stupid suggestion, but: have you tried to restart the OSDs? I've also encountered some random CRC errors that only showed

[ceph-users] OSD tuning no longer required?

2018-07-12 Thread Robert Stanford
I saw this in the Luminous release notes: "Each OSD now adjusts its default configuration based on whether the backing device is an HDD or SSD. Manual tuning generally not required" Which tuning in particular? The ones in my configuration are osd_op_threads, osd_disk_threads,

Re: [ceph-users] MDS damaged

2018-07-12 Thread Paul Emmerich
This might seem like a stupid suggestion, but: have you tried to restart the OSDs? I've also encountered some random CRC errors that only showed up when trying to read an object, but not on scrubbing, that magically disappeared after restarting the OSD. However, in my case it was clearly related

Re: [ceph-users] KPIs for Ceph/OSD client latency / deepscrub latency overhead

2018-07-12 Thread Paul Emmerich
2018-07-12 8:37 GMT+02:00 Marc Schöchlin : > > In a first step i just would like to have two simple KPIs which describe > a average/aggregated write/read latency of these statistics. > > Are there tools/other functionalities which provide this in a simple way? > It's one of the main KPI our

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-12 Thread Magnus Grönlund
Hej David and thanks! That was indeed the magic trick, no more peering, stale or down PGs. Upgraded the ceph-packages on the hosts, restarted the OSDs and then "ceph osd require-osd-release luminous" /Magnus 2018-07-12 12:05 GMT+02:00 David Majchrzak : > Hi/Hej Magnus, > > We had a similar

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
Il 12/07/18 11:20, Alessandro De Salvo ha scritto: Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-12 Thread David Majchrzak
Hi/Hej Magnus, We had a similar issue going from latest hammer to jewel (so might not be applicable for you), with PGs stuck peering / data misplaced, right after updating all mons to latest jewel at that time 10.2.10. Finally setting the require_jewel_osds put everything back in place ( we

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-12 Thread Magnus Grönlund
Hi list, Things went from bad to worse, tried to upgrade some OSDs to Luminous to see if that could help but that didn’t appear to make any difference. But for each restarted OSD there was a few PGs that the OSD seemed to “forget” and the number of undersized PGs grew until some PGs had been

Re: [ceph-users] RADOSGW err=Input/output error

2018-07-12 Thread Will Zhao
Hi : I use libs3 to run test . The network is IB. The error in libcurl is the following: == Info: Operation too slow. Less than 1 bytes/sec transferred the last 15 seconds == Info: Closing connection 766 and a full request error in rgw is as the following: 2018-07-12 15:42:30.501074

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object

Re: [ceph-users] MDS damaged

2018-07-12 Thread Dan van der Ster
On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: > > On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo > wrote: >> >> OK, I found where the object is: >> >> >> ceph osd map cephfs_metadata 200. >> osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg >>

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-07-12 Thread Dan van der Ster
On Wed, Jul 11, 2018 at 11:40 PM Gregory Farnum wrote: > > On Mon, Jun 25, 2018 at 12:34 AM Dan van der Ster wrote: >> >> On Fri, Jun 22, 2018 at 10:44 PM Gregory Farnum wrote: >> > >> > On Fri, Jun 22, 2018 at 6:22 AM Sergey Malinin wrote: >> >> >> >> From >> >>

Re: [ceph-users] SSDs for data drives

2018-07-12 Thread Adrian Saul
We started our cluster with consumer (Samsung EVO) disks and the write performance was pitiful, they had periodic spikes in latency (average of 8ms, but much higher spikes) and just did not perform anywhere near where we were expecting. When replaced with SM863 based devices the difference

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
> Il giorno 11 lug 2018, alle ore 23:25, Gregory Farnum ha > scritto: > >> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo >> wrote: >> OK, I found where the object is: >> >> >> ceph osd map cephfs_metadata 200. >> osdmap e632418 pool 'cephfs_metadata' (10) object

Re: [ceph-users] KPIs for Ceph/OSD client latency / deepscrub latency overhead

2018-07-12 Thread Marc Schöchlin
Hello Paul, thanks for your response/hints. I discovered the following tool in the ceph source repository: https://github.com/ceph/ceph/blob/master/src/tools/histogram_dump.py The tool provides output based on the statistics mention by you: # ceph daemon osd.24 perf histogram dump|grep -P