Re: [ceph-users] Failure probability with largish deployments

2013-12-26 Thread Kyle Bader
Yes, that also makes perfect sense, so the aforementioned 12500 objects for a 50GB image, at a 60 TB cluster/pool with 72 disk/OSDs and 3 way replication that makes 2400 PGs, following the recommended formula. What amount of disks (OSDs) did you punch in for the following run? Disk

Re: [ceph-users] Failure probability with largish deployments

2013-12-23 Thread Loic Dachary
Hi Kyle, It would be great if you could share how you invoked the tool. I'm tempting to play with it and an example would help a great deal :-) Cheers On 20/12/2013 22:37, Kyle Bader wrote: Using your data as inputs to in the Ceph reliability calculator [1] results in the following: Disk

Re: [ceph-users] Failure probability with largish deployments

2013-12-23 Thread Kyle Bader
Is an object a CephFS file or a RBD image or is it the 4MB blob on the actual OSD FS? Objects are at the RADOS level, CephFS filesystems, RBD images and RGW objects are all composed by striping RADOS objects - default is 4MB. In my case, I'm only looking at RBD images for KVM volume storage,

Re: [ceph-users] Failure probability with largish deployments

2013-12-23 Thread Christian Balzer
Hello, On Sun, 22 Dec 2013 07:44:31 -0800 Kyle Bader wrote: Is an object a CephFS file or a RBD image or is it the 4MB blob on the actual OSD FS? Objects are at the RADOS level, CephFS filesystems, RBD images and RGW objects are all composed by striping RADOS objects - default is 4MB.

Re: [ceph-users] Failure probability with largish deployments

2013-12-22 Thread Christian Balzer
Hello Kyle, On Fri, 20 Dec 2013 13:37:18 -0800 Kyle Bader wrote: Using your data as inputs to in the Ceph reliability calculator [1] results in the following: I shall have to (literally, as in GIT) check that out next week... However before that, some questions to help me understand what

[ceph-users] Failure probability with largish deployments

2013-12-19 Thread Christian Balzer
Hello, In my Sanity check thread I postulated yesterday that to get the same redundancy and resilience for disk failures (excluding other factors) as my proposed setup (2 nodes, 2x 11 3TB HDs RAID6 per node, 2 global hotspares, thus 4 OSDs) the Ceph way one would need need something like 6 nodes

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Robert van Leeuwen
Yeah, I saw erasure encoding mentioned a little while ago, but that's likely not to be around by the time I'm going to deploy things. Nevermind that super bleeding edge isn't my style when it comes to production systems. ^o^ And at something like 600 disks, that would still have to be a

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Mariusz Gronczewski
Dnia 2013-12-19, o godz. 17:39:54 Christian Balzer ch...@gol.com napisał(a): Hello, In my Sanity check thread I postulated yesterday that to get the same redundancy and resilience for disk failures (excluding other factors) as my proposed setup (2 nodes, 2x 11 3TB HDs RAID6 per node, 2

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Wido den Hollander
On 12/19/2013 09:39 AM, Christian Balzer wrote: Hello, In my Sanity check thread I postulated yesterday that to get the same redundancy and resilience for disk failures (excluding other factors) as my proposed setup (2 nodes, 2x 11 3TB HDs RAID6 per node, 2 global hotspares, thus 4 OSDs) the

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Gregory Farnum
On Thu, Dec 19, 2013 at 12:39 AM, Christian Balzer ch...@gol.com wrote: Hello, In my Sanity check thread I postulated yesterday that to get the same redundancy and resilience for disk failures (excluding other factors) as my proposed setup (2 nodes, 2x 11 3TB HDs RAID6 per node, 2 global

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Wolfgang Hennerbichler
On 19 Dec 2013, at 16:43, Gruher, Joseph R joseph.r.gru...@intel.com wrote: It seems like this calculation ignores that in a large Ceph cluster with triple replication having three drive failures doesn't automatically guarantee data loss (unlike a RAID6 array)? not true with RBD images,

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Wido den Hollander
On 12/19/2013 08:39 PM, Wolfgang Hennerbichler wrote: On 19 Dec 2013, at 16:43, Gruher, Joseph R joseph.r.gru...@intel.com wrote: It seems like this calculation ignores that in a large Ceph cluster with triple replication having three drive failures doesn't automatically guarantee data loss

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Johannes Formann
Am 19.12.2013 um 20:39 schrieb Wolfgang Hennerbichler wo...@wogri.com: On 19 Dec 2013, at 16:43, Gruher, Joseph R joseph.r.gru...@intel.com wrote: It seems like this calculation ignores that in a large Ceph cluster with triple replication having three drive failures doesn't automatically

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Christian Balzer
Hello, On Thu, 19 Dec 2013 12:12:13 +0100 Mariusz Gronczewski wrote: Dnia 2013-12-19, o godz. 17:39:54 Christian Balzer ch...@gol.com napisał(a): [snip] So am I completely off my wagon here? How do people deal with this when potentially deploying hundreds of disks in a single

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Christian Balzer
Hello, On Thu, 19 Dec 2013 12:42:15 +0100 Wido den Hollander wrote: On 12/19/2013 09:39 AM, Christian Balzer wrote: [snip] I'd suggest to use different vendors for the disks, so that means you'll probably be mixing Seagate and Western Digital in such a setup. That's funny, because I

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Christian Balzer
Hello, On Thu, 19 Dec 2013 15:43:16 + Gruher, Joseph R wrote: [snip] It seems like this calculation ignores that in a large Ceph cluster with triple replication having three drive failures doesn't automatically guarantee data loss (unlike a RAID6 array)? If your data is triple

Re: [ceph-users] Failure probability with largish deployments

2013-12-19 Thread Christian Balzer
On Thu, 19 Dec 2013 21:01:47 +0100 Wido den Hollander wrote: On 12/19/2013 08:39 PM, Wolfgang Hennerbichler wrote: On 19 Dec 2013, at 16:43, Gruher, Joseph R joseph.r.gru...@intel.com wrote: It seems like this calculation ignores that in a large Ceph cluster with triple replication