Hi Strahil,

I've tried to measure the cost or of erasure coding and, more importantly, VDO 
with de-duplication and compression a bit.

Erasure coding should be neglible in terms of CPU power while the vastly more 
complex LZ4 compression (used inside VDO) really is rather impressive at 
1GByte/s single threaded for compression (6Gbyte/s decompression, on a 
25GByte/s memory bus) on the 15Watt NUCs I am using for one cluster.

The storage I/O overhead of erasure coding shouldn't really matter with NVMe 
becoming cheaper than SATA SSD. Perhaps the write amplification needs to be 
watched with SSDs, but a lot of that is writeback tuning and with a Gluster in 
the back, you can commit to RAM as long as you have a quorum (and a UPS).

Actually with Gluster I guess most of the erasure coding would actually be done 
by the client and the network amplification would also be there, but not really 
different between erasure coding and replicas: If you write to nine nodes, you 
write to nine nodes from the client independent of the encoding.

There the ability to say "please continue to use the 4:2 dispersion as I expand 
from 6 to 9 nodes and roll that across on a shard by shard base without me 
having to set up bricks like that", would certainly help.

With all of VDO enabled I get 200MByte/s for a random data workload on FIO via 
Gluster, which becomes 600MByte/s for reads with 3 replicas on the 10Gbit 
network I use, 60% of the theoretical maximum with random I/O.

That's completely adequate, because we're not running HPC or SAP batches here 
and I'd be rather sure that using erasure coding with 6 and 9 nodes won't 
introduce a performance bottleneck, unless I go to 40 or 100GBit on the network.

I'd just really want to be able to choose between say 1, 2 or 3 out of 9 bricks 
being used for redundancy, depending on if it's an HCI block next door, going 
into a ship with months at sea or into a space station.

I'd also probably add an extra node or two to act as warm (even cold) standby 
in critical or hard-to-reach locations, that act as compute-only nodes 
initially (to avoid split quotas), but can be promoted to replace a storage 
node that failed without hands-on intervention.

oVirt HCI is as close at it gets to LEGO computers, but right now it's doing 
LEGO with your hands tied behind your back.

Kind regards, Thomas
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASJO7WGYNVCZG4CBD3WGLTNERPGAQELO/

Reply via email to