Re: [ceph-users] xfs corruption

2016-03-07 Thread Jan Schermer
This functionality is common on RAID controllers in combination with HCL-certified drives. This usually means that you can't rely on it working unless you stick to the exact combination that's certified, which is impossible in practice. For example LSI controllers do this if you get the right SS

Re: [ceph-users] xfs corruption

2016-03-07 Thread Ric Wheeler
Unfortunately, you will have to follow up with the hardware RAID card vendors to see what commands their firmware handles. Good luck! Ric On 03/07/2016 01:37 PM, Ferhat Ozkasgarli wrote: I am always forgetting this reply all things. / / /RAID5 and RAID10 (or other raid levels) are a proper

Re: [ceph-users] xfs corruption

2016-03-07 Thread Ferhat Ozkasgarli
I am always forgetting this reply all things. *RAID5 and RAID10 (or other raid levels) are a property of the block devices. XFS, ext4, etc can pass down those commands to the firmware on the card and it is up to the firmware to propagate the command on to the backend drives.* You mean I can get a

Re: [ceph-users] xfs corruption

2016-03-07 Thread Ric Wheeler
You are right that some cards might not send those commands on to the backend storage, but spinning disks don't usually implement either trim or discard (SSD's do though). XFS, ext4, etc can pass down those commands to the firmware on the card and it is up to the firmware to propagate the comm

Re: [ceph-users] xfs corruption

2016-03-06 Thread Ferhat Ozkasgarli
Rick; you mean Raid 0 environment right? If you use raid 5 or raid 10 or some other more complex raid configuration most of the physical disks' abilities vanishes. (trim, discard etc..) Only handful of hardware raid cards able to pass trim and discard commands to physical disks if the raid config

Re: [ceph-users] xfs corruption

2016-03-06 Thread Ric Wheeler
It is perfectly reasonable and common to use hardware RAID cards in writeback mode under XFS (and under Ceph) if you configure them properly. The key thing is that for writeback cache enabled, you need to make sure that the S-ATA drives' write cache itself is disabled. Also make sure that yo

Re: [ceph-users] xfs corruption

2016-02-26 Thread fangchen sun
Thank you for your response! All my hosts have raid cards. Some raid cards are in pass-throughput mode, and the others are in write-back mode. I will set all raid cards pass-throughput mode and observe for a period of time. Best Regards sunspot 2016-02-25 20:07 GMT+08:00 Ferhat Ozkasgarli : >

Re: [ceph-users] xfs corruption

2016-02-25 Thread Ferhat Ozkasgarli
This has happened me before but in virtual machine environment. The VM was KVM and storage was RBD. My problem was a bad cable in network. You should check following details: 1-) Do you use any kind of hardware raid configuration? (Raid 0, 5 or 10) Ceph does not work well on hardware raid syste

[ceph-users] xfs corruption

2016-02-23 Thread fangchen sun
Dear all: I have a ceph object storage cluster with 143 osd and 7 radosgw, and choose XFS as the underlying file system. I recently ran into a problem that sometimes a osd is marked down when the returned value of the function "chain_setxattr()" is -117. I only umount the disk and repair it with "

Re: [ceph-users] xfs corruption, data disaster!

2015-06-11 Thread Eric Sandeen
On 5/11/15 9:47 AM, Ric Wheeler wrote: > On 05/05/2015 04:13 AM, Yujian Peng wrote: >> Emmanuel Florac writes: >> >>> Le Mon, 4 May 2015 07:00:32 + (UTC) >>> Yujian Peng 126.com> écrivait: >>> I'm encountering a data disaster. I have a ceph cluster with 145 osd. The data center had

Re: [ceph-users] xfs corruption, data disaster!

2015-05-11 Thread Ric Wheeler
On 05/05/2015 04:13 AM, Yujian Peng wrote: Emmanuel Florac writes: Le Mon, 4 May 2015 07:00:32 + (UTC) Yujian Peng 126.com> écrivait: I'm encountering a data disaster. I have a ceph cluster with 145 osd. The data center had a power problem yesterday, and all of the ceph nodes were down.

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Saverio Proto
OK I see the problem. Thanks for explanation. However he talks about 4 hosts. So with the default CRUSHMAP losing 1 or more OSDs on the same host is irrelevant. The real problem he lost 4 OSDs on different hosts with pools of size 3 , so he lost the PGs that where mapped to 3 failing drives. So h

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Christian Balzer
Hello, On Thu, 7 May 2015 00:34:58 +0200 Saverio Proto wrote: > Hello, > > I dont get it. You lost just 6 osds out of 145 and your cluster is not > able to recover ? > He lost 6 OSDs at the same time. With 145 OSDs and standard replication of 3 loosing 3 OSDs makes data loss already extremely

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Saverio Proto
Hello, I dont get it. You lost just 6 osds out of 145 and your cluster is not able to recover ? what is the status of ceph -s ? Saverio 2015-05-04 9:00 GMT+02:00 Yujian Peng : > Hi, > I'm encountering a data disaster. I have a ceph cluster with 145 osd. The > data center had a power problem ye

Re: [ceph-users] xfs corruption, data disaster!

2015-05-05 Thread Nick Fisk
> Nick Fisk > Sent: 05 May 2015 07:46 > To: 'Yujian Peng'; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] xfs corruption, data disaster! > > This is probably similar to what you want to try and do, but also mark those > failed OSD's as lost as I don't thi

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Nick Fisk
Yujian Peng > Sent: 05 May 2015 02:14 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] xfs corruption, data disaster! > > Emmanuel Florac writes: > > > > > Le Mon, 4 May 2015 07:00:32 + (UTC) Yujian Peng > 126.com> écrivait: > > > >

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Yujian Peng
Emmanuel Florac writes: > > Le Mon, 4 May 2015 07:00:32 + (UTC) > Yujian Peng 126.com> écrivait: > > > I'm encountering a data disaster. I have a ceph cluster with 145 osd. > > The data center had a power problem yesterday, and all of the ceph > > nodes were down. But now I find that 6 dis

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Emmanuel Florac
Le Mon, 4 May 2015 07:00:32 + (UTC) Yujian Peng écrivait: > I'm encountering a data disaster. I have a ceph cluster with 145 osd. > The data center had a power problem yesterday, and all of the ceph > nodes were down. But now I find that 6 disks(xfs) in 4 nodes have > data corruption. Some di

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Christopher Kunz
Am 04.05.15 um 09:00 schrieb Yujian Peng: > Hi, > I'm encountering a data disaster. I have a ceph cluster with 145 osd. The > data center had a power problem yesterday, and all of the ceph nodes were > down. > But now I find that 6 disks(xfs) in 4 nodes have data corruption. Some disks > are unabl

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Steffen W Sørensen
> On 04/05/2015, at 15.01, Yujian Peng wrote: > > Alexandre DERUMIER writes: > >> >> >> maybe this could help to repair pgs ? >> >> http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/ >> >> (6 disk at the same time seem pretty strange. do you have some kind of >> writ

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Yujian Peng
Alexandre DERUMIER writes: > > > maybe this could help to repair pgs ? > > http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/ > > (6 disk at the same time seem pretty strange. do you have some kind of writeback cache enable of theses disks ?) The only writeback cache i

Re: [ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Alexandre DERUMIER
quot;ceph-users" Envoyé: Lundi 4 Mai 2015 09:00:32 Objet: [ceph-users] xfs corruption, data disaster! Hi, I'm encountering a data disaster. I have a ceph cluster with 145 osd. The data center had a power problem yesterday, and all of the ceph nodes were down. But now I find that 6 disks(

[ceph-users] xfs corruption, data disaster!

2015-05-04 Thread Yujian Peng
Hi, I'm encountering a data disaster. I have a ceph cluster with 145 osd. The data center had a power problem yesterday, and all of the ceph nodes were down. But now I find that 6 disks(xfs) in 4 nodes have data corruption. Some disks are unable to mount, and some disks have IO errors in syslog.