Thanks for the detailed answer Erekle.
I conclude that it is worth in any scenario to have a arbiter node in
order to avoid wasting more disk space to RAID X + Gluster Replication
on the top of it. The cost seems much lower if you consider running
costs of the whole storage and compare it with the cost to build the
arbiter node. Even having a fully redundant arbiter service with 2 nodes
would make it wort on a larger deployment.
Regards
Fernando
On 07/08/2017 17:07, Erekle Magradze wrote:
Hi Fernando (sorry for misspelling your name, I used a different
keyboard),
So let's go with the following scenarios:
1. Let's say you have two servers (replication factor is 2), i.e. two
bricks per volume, in this case it is strongly recommended to have the
arbiter node, the metadata storage that will guarantee avoiding the
split brain situation, in this case for arbiter you don't even need a
disk with lots of space, it's enough to have a tiny ssd but hosted on
a separate server. Advantage of such setup is that you don't need the
RAID 1 for each brick, you have the metadata information stored in
arbiter node and brick replacement is easy.
2. If you have odd number of bricks (let's say 3, i.e. replication
factor is 3) in your volume and you didn't create the arbiter node as
well as you didn't configure the quorum, in this case the entire load
for keeping the consistency of the volume resides on all 3 servers,
each of them is important and each brick contains key information,
they need to cross-check each other (that's what people usually do
with the first try of gluster :) ), in this case replacing a brick is
a big pain and in this case RAID 1 is a good option to have (that's
the disadvantage, i.e. loosing the space and not having the JBOD
option) advantage is that you don't have the to have additional
arbiter node.
3. You have odd number of bricks and configured arbiter node, in this
case you can easily go with JBOD, however a good practice would be to
have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly
sufficient for volumes with 10s of TB-s in size.)
That's basically it
The rest about the reliability and setup scenarios you can find in
gluster documentation, especially look for quorum and arbiter node
configs+options.
Cheers
Erekle
P.S. What I was mentioning, regarding a good practice is mostly
related to the operations of gluster not installation or deployment,
i.e. not the conceptual understanding of gluster (conceptually it's a
JBOD system).
On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:
Thanks for the clarification Erekle.
However I get surprised with this way of operating from GlusterFS as
it adds another layer of complexity to the system (either a hardware
or software RAID) before the gluster config and increase the system's
overall costs.
An important point to consider is: In RAID configuration you already
have space 'wasted' in order to build redundancy (either RAID 1, 5,
or 6). Then when you have GlusterFS on the top of several RAIDs you
have again more data replicated so you end up with the same data
consuming more space in a group of disks and again on the top of
several RAIDs depending on the Gluster configuration you have (in a
RAID 1 config the same data is replicated 4 times).
Yet another downside of having a RAID (specially RAID 5 or 6) is that
it reduces considerably the write speeds as each group of disks will
end up having the write speed of a single disk as all other disks of
that group have to wait for each other to write as well.
Therefore if Gluster already replicates data why does it create this
big pain you mentioned if the data is replicated somewhere else, can
still be retrieved to both serve clients and reconstruct the
equivalent disk when it is replaced ?
Fernando
On 07/08/2017 10:26, Erekle Magradze wrote:
Hi Frenando,
Here is my experience, if you consider a particular hard drive as a
brick for gluster volume and it dies, i.e. it becomes not accessible
it's a huge hassle to discard that brick and exchange with another
one, since gluster some tries to access that broken brick and it's
causing (at least it cause for me) a big pain, therefore it's better
to have a RAID as brick, i.e. have RAID 1 (mirroring) for each
brick, in this case if the disk is down you can easily exchange it
and rebuild the RAID without going offline, i.e switching off the
volume doing brick manipulations and switching it back on.
Cheers
Erekle
On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:
For any RAID 5 or 6 configuration I normally follow a simple gold
rule which gave good results so far:
- up to 4 disks RAID 5
- 5 or more disks RAID 6
However I didn't really understand well the recommendation to use
any RAID with GlusterFS. I always thought that GlusteFS likes to
work in JBOD mode and control the disks (bricks) directlly so you
can create whatever distribution rule you wish, and if a single
disk fails you just replace it and which obviously have the data
replicated from another. The only downside of using in this way is
that the replication data will be flow accross all servers but that
is not much a big issue.
Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.
Thanks
Regards
Fernando
On 07/08/2017 03:46, Devin Acosta wrote:
Moacir,
I have recently installed multiple Red Hat Virtualization hosts
for several different companies, and have dealt with the Red Hat
Support Team in depth about optimal configuration in regards to
setting up GlusterFS most efficiently and I wanted to share with
you what I learned.
In general Red Hat Virtualization team frowns upon using each DISK
of the system as just a JBOD, sure there is some protection by
having the data replicated, however, the recommendation is to use
RAID 6 (preferred) or RAID-5, or at least RAID-1 at the very least.
Here is the direct quote from Red Hat when I asked about RAID and
Bricks:
/
/
/"A typical Gluster configuration would use RAID underneath the
bricks. RAID 6 is most typical as it gives you 2 disk failure
protection, but RAID 5 could be used too. Once you have the RAIDed
bricks, you'd then apply the desired replication on top of that.
The most popular way of doing this would be distributed replicated
with 2x replication. In general you'll get better performance with
larger bricks. 12 drives is often a sweet spot. Another option
would be to create a separate tier using all SSD’s.” /
/In order to SSD tiering from my understanding you would need 1 x
NVMe drive in each server, or 4 x SSD hot tier (it needs to be
distributed, replicated for the hot tier if not using NVME). So
with you only having 1 SSD drive in each server, I’d suggest maybe
looking into the NVME option. /
/
/
/Since your using only 3-servers, what I’d probably suggest is to
do (2 Replicas + Arbiter Node), this setup actually doesn’t
require the 3rd server to have big drives at all as it only stores
meta-data about the files and not actually a full copy. /
/
/
/Please see the attached document that was given to me by Red Hat
to get more information on this. Hope this information helps you./
/
/
--
Devin Acosta, RHCA, RHVCA
Red Hat Certified Architect
On August 6, 2017 at 7:29:29 PM, Moacir Ferreira
(moacirferre...@hotmail.com <mailto:moacirferre...@hotmail.com>)
wrote:
I am willing to assemble a oVirt "pod", made of 3 servers, each
with 2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The
idea is to use GlusterFS to provide HA for the VMs. The 3 servers
have a dual 40Gb NIC and a dual 10Gb NIC. So my intention is to
create a loop like a server triangle using the 40Gb NICs for
virtualization files (VMs .qcow2) access and to move VMs around
the pod (east /west traffic) while using the 10Gb interfaces for
giving services to the outside world (north/south traffic).
This said, my first question is: How should I deploy GlusterFS in
such oVirt scenario? My questions are:
1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt
node, and then create a GlusterFS using them?
2 - Instead, should I create a JBOD array made of all server's disks?
3 - What is the best Gluster configuration to provide for HA
while not consuming too much disk space?
4 - Does a oVirt hypervisor pod like I am planning to build, and
the virtualization environment, benefits from tiering when using
a SSD disk? And yes, will Gluster do it by default or I have to
configure it to do so?
At the bottom line, what is the good practice for using GlusterFS
in small pods for enterprises?
You opinion/feedback will be really appreciated!
Moacir
_______________________________________________
Users mailing list
Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Recogizer Group GmbH
Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555
e-mailerekle.magra...@recogizer.de
Web:www.recogizer.com
Recogizer auf LinkedInhttps://www.linkedin.com/company-beta/10039182/
Folgen Sie uns auf Twitterhttps://twitter.com/recogizer
-----------------------------------------------------------------
Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten
haben,
informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der
darin enthaltenen Informationen ist nicht gestattet.
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users