[ceph-users] Re: Building a petabyte cluster from scratch

2020-05-29 Thread Jack
On 12/4/19 9:19 AM, Konstantin Shalygin wrote: > CephFS indeed support snapshots. Since Samba 4.11 support this feature > too with vfs_ceph_snapshots. You can snapshot, but you cannot export a diff of snapshots > ___ > ceph-users mailing list -- ceph-us

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread jesper
> After years of using Ceph, we plan to build soon a new cluster bigger than > what > we've done in the past. As the project is still in reflection, I'd like to > have your thoughts on our planned design : any feedback is welcome :) > > > ## Requirements > > * ~1 PB usable space for file storage,

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Nathan Fish
Rather than a cache tier, I would put an NVMe device in each OSD box for Bluestore's DB and WAL. This will significantly improve small IOs. 14*16 HDDs / 11 chunks = 20 HDD's worth of write IOPs. If you expect these files to be written sequentially, this is probably ok. Mons and mgr on OSD nodes sh

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Jack
Hi, You will get slow performance: EC is slow, HDD are slow too With 400 iops per device, you get 89600 iops for the whole cluster, raw With 8+3EC, each logical write is mapped to 11 physical writes You get only 8145 write IOPS (is my math correct ?), which I find very low for a PB storage So, u

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Anthony D'Atri
> > ## Requirements > > * ~1 PB usable space for file storage, extensible in the future > * The files are mostly "hot" data, no cold storage > * Purpose : storage for big files being essentially used on windows > workstations (10G access) > * Performance is better :) > > > ## Global design >

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread jesper
>> * Hardware raid with Battery Backed write-cache - will allow OSD to ack >> writes before hitting spinning rust. > > Disagree. See my litany from a few months ago. Use a plain, IT-mode HBA. > Take the $$ you save and put it toward building your cluster out of SSDs > instead of HDDs. That way

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Nathan Fish
If k=8,m=3 is too slow on HDDs, so you need replica 3 and SSD DB/WAL, vs EC 8,3 on SSD, then that's (1/3) / (8/11) = 0.45 multiplier on the SSD space required vs HDDs. That brings it from 6x to 2.7x. Then you have the benefit of not needing separate SSDs for DB/WAL both in hardware cost and complex

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread jesper
> If k=8,m=3 is too slow on HDDs, so you need replica 3 and SSD DB/WAL, > vs EC 8,3 on SSD, then that's (1/3) / (8/11) = 0.45 multiplier on the > SSD space required vs HDDs. > That brings it from 6x to 2.7x. Then you have the benefit of not > needing separate SSDs for DB/WAL both in hardware cost a

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Paul Emmerich
It's pretty pointless to discuss erasure coding vs replicated without knowing how it'll be used. There are setups where erasure coding is faster than replicated. You do need to write less data overall, so if that's your bottleneck then erasure coding will be faster. Paul -- Paul Emmerich Looki

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Jack
> Cost of SSD vs. HDD is still in the 6:1 favor of HHD's. It is not, you can buy fewer thing for less money, with HDDs -that is true $/TB is better from spinning than from flash, but this is not the most important indicator, and by far: $/IOPS is another story indeed On 12/3/19 9:46 PM, jes...@kr

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Alex Gorbachev
FYI for ZFS on RBD. https://github.com/zfsonlinux/zfs/issues/3324 We go for a more modest setting with async to 64, not 2. -- Alex Gorbachev Intelligent Systems Services Inc. On Tue, Dec 3, 2019 at 3:07 PM Fabien Sirjean wrote: > Hi Ceph users ! > > After years of using Ceph, we plan to

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Martin Verges
Hello, * 2 x Xeon Silver 4212 (12C/24T) > I would choose single cpu AMD EPYC systems for lower price with better performance. Supermicro does have some good systems for AMD as well. * 16 x 10 TB nearline SAS HDD (8 bays for future needs) > Don't waste money here as well. No real gain. Inv

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-04 Thread Konstantin Shalygin
On 12/4/19 3:06 AM, Fabien Sirjean wrote: * ZFS on RBD, exposed via samba shares (cluster with failover) Why not use samba vfs_ceph instead? It's scalable direct access. * What about CephFS ? We'd like to use RBD diff for backups but it looks impossible to use snapshot diff with Cephf

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-04 Thread Jack
You can snapshot, but you cannot export a diff of snapshots On 12/4/19 9:19 AM, Konstantin Shalygin wrote: > On 12/4/19 3:06 AM, Fabien Sirjean wrote: >>   * ZFS on RBD, exposed via samba shares (cluster with failover) > > Why not use samba vfs_ceph instead? It's scalable direct access. > >>   *

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-04 Thread Darren Soothill
Hi Fabien, ZFS ontop of RBD really makes me shudder. ZFS expects to have individual disk devices that it can manage. It thinks it has them with this config but CEPH is masking the real data behind it. As has been said before why not just use Samba directly from CephFS and remove that layer of

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-04 Thread Phil Regnauld
Darren Soothill (darren.soothill) writes: > Hi Fabien, > > ZFS ontop of RBD really makes me shudder. ZFS expects to have individual disk > devices that it can manage. It thinks it has them with this config but CEPH > is masking the real data behind it. > > As has been said before why not just u