On 28/03/14 19:31, Fabio M. Di Nitto wrote:
Are there any known issues, guidelines, or recommendations for having
a single RHCS cluster with different OS releases on the nodes?
Only one answer.. don't do it. It's not supported and it's only asking
for troubles.
Seconded. There are _substanti
On 18/03/14 13:38, Mr.Pine wrote:
I have accidentally reformatted a GFS cluster.
We need to unformat it.. is there any way to recover disk ?
Backups?
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
On 10/03/14 18:15, stephen.ran...@stfc.ac.uk wrote:
Hello,
When using gfs2 with quotas on a SAN that is providing storage to two
clustered systems running CentOS6.5,
As a matter of interest: how are you exporting the storage, or is this
integral to the cluster itself?
--
Linux-clust
On 10/03/14 18:15, stephen.ran...@stfc.ac.uk wrote:
Hello,
When using gfs2 with quotas on a SAN that is providing storage to two
clustered systems running CentOS6.5, one of the systems
can crash. This crash appears to be caused when a user tries
to add something to a SAN disk when they have exc
As anyone who's tried to use kernel NFS in a clustered environment
knows, it's fraught with issues which risk severe data corruption.
has anyone tried using the Userspace nfs-ganesha server?
I'd be interested ot hear how you got on.
--
Linux-cluster mailing list
Linux-cluster@redhat.com
ht
Qlogic have announced some new adaptors which look promising.
The $64million question:
Will GFS play nice with these?
http://www.theregister.co.uk/2013/03/21/fabriccache/
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
On 05/09/12 15:59, Randy Zagar wrote:
What I don't understand is what changed between RHEL-5 and RHEL-6 that
has made HA NFS failover so difficult?
HA NFS failover has always been difficult for a number of reasons mostly
related to how abysmal the Linux NFS implementation is.
I have been r
On 08/08/12 13:50, Bob Peterson wrote:
We currently don't have any plans for defrag tool for GFS2. In theory, you
can always copy the data from an old file system to a new one using this
new kernel code, and it should be less fragmented.
Defragfs works as well as any other userland method:
h
On 12/04/12 14:04, AK wrote:
> Ah, the evils of mass invite
And the evils of Linkedin in particular.
The only way to stop getting invites is to setup a Linkedin account
yourself and from that point you _cannot_ opt out of receiving mail from
them from time to time.
I regard them as spammers
On 03/04/12 14:28, Steven Whitehouse wrote:
Spinning disks are slow to seek, large arrays even more so.
Large arrays should be much faster, provided the data is in cache.
Or not, when there's a lot of random IO involved and it's not in cache.
I'm talking about arrays such as nexsan atabeast
Real Dumb Question[tm] time
Has anyone tried putting bcache/flashcache in front of shared storage in
a GFS2 cluster (on each node, of course)
Did it work?
Should it work?
Is it safe?
Are there ways of making it safe?
Am I mad for thinking about it?
Rationale:
Spinning disks are slow
On 08/03/12 22:59, Jeff Sturm wrote:
The downside of partitions is they aren't easy to change. You can add them
safely while the storage array is in use, but each host needs to reload the
partition table when you're done with changes before the new storage can be
used, and that may not happe
On 09/02/12 15:14, Ray Van Dolson wrote:
I'm exploring some options for speeding that up -- the main one being
dropping my cluster to only one node. Is this doable for a file system
that was greated with the dlm lock manager instead of lock_nolock?
Yes. You can force the use of lock_nolock in
On 26/01/12 16:05, Digimer wrote:
Is anyone actually using DRDB for serious cluster implementations (ie,
production systems) or is it just being used for hobby/test rigs?
I use it rather extensively in production. I use it to back clustered
LVM-backed virtual machines and GFS2 partitions. I st
On 26/01/12 14:41, Digimer wrote:
As for qdisk, you can't use it on DRBD, only on a SAN (as it is possible
to have a split-brain condition where both nodes go StandAlone and
Primary, this allowing both nodes to think they have the qdisk vote).
Is anyone actually using DRDB for serious cluster
On 09/01/12 13:34, Fabio M. Di Nitto wrote:
Something i forgot to mention in the other email, is that for example,
you can just move the LUNs from your SAN from one cluster to another
assuming you are running GFS2 and that will work.
And assuming that you have 2 clusters. This might be a possi
On 09/01/12 13:33, Rajagopal Swaminathan wrote:
Switches used for this purpose are best completely isolated from the rest of
the network and multicast traffic control should be DISABLED.
I distinctly remember asking the network guys Multicast mode to be on
for the Heartbeat network (for the c
On 09/01/12 13:23, SATHYA - IT wrote:
Alan,
Corosync (heartbeat) network is not connected to switch. The network is
connected between server to server directly.
See my comment about direct hookups. My experience is that they are
prone to playing up for no apparent reason (NICs simply aren't d
On 09/01/12 09:36, Fabio M. Di Nitto wrote:
RH's advice to use is to "Big Bang" it.
It´s not much of an advice, as RH does not officially support this
upgrade method.
Indeed, but scheduling downtime in a 24*7*365.254 operation like space
science ftp servers is tricky. (1: You can't please e
On 09/01/12 05:24, Digimer wrote:
With both of the bond's NICs down, the bond itself is going to drop.
Odds are, both NICs are plugged into the same switch.
(assuming the OP isn't running things plugged nic-nic - which I have
found in the past tends to be flakey when N-way negotiation becom
On 09/01/12 04:51, Digimer wrote:
> Alternatively, use some spare machines to mock-up the current cluster
> and then test-upgrade. It might work flawlessly, I genuinely don't know.
Test setups aren't always a good metric. Everything worked fine on our
last changeover until we put real-world load
On 09/01/12 02:38, Digimer wrote:
> Technically yes, practically no. Or rather, not without a lot of
> testing first.
This is "rather a shame"
I have a similar requirement (EL5 -> EL6 with GFS)
> There may be some other things you need to do as well. Please be sure
> to do proper testing an
Steven Whitehouse wrote:
Well, can't we (the Redhat/Centos fanboys) expect a critical Clustered
filesystem like GFS2 (Which supports over 16TB on a 64-bit bit systems
at least) take a leaf or two from () ZFS on this issue?
I'm not quite sure which feature you are suggesting that we take, but
I
On Wed, 16 Nov 2011, Steven Whitehouse wrote:
> The problem is the blocks following that, such as the master directory
> which contains all the system files. If enough of that has been
> destroyed, it would make it very tricky to reconstruct. Even so it might
> be possible depending on exactly whi
Bob Peterson wrote:
I've taken a close look at the image file you created.
This appears to be a normal, everyday GFS2 file system
except there is a section of 16 blocks (or 0x10 in hex)
that are completely destroyed near the beginning of the
file system, right after the root directory. Unfortuna
Steven Whitehouse wrote:
We see appreciable knee points in GFS directory performance at 512, 4096
and 16384 files/directory, with progressively worse performance
deterioration between each knee pair. (It's a 2^n type problem)
That is a bit strange. The GFS2 directory entries are sized accord
Nicolas Ross wrote:
Get me right, there are millions of files, but no more than a few
hundreds per directory. They are spread out splited on the database id,
2 caracters at a time. So a file name 1234567.jpg would end up in a
directory 12/34/5/, or something similar.
OK, the way you wrote it
Nicolas Ross wrote:
On some services, there are document directories that are huge, not that
much in size (about 35 gigs), but in number of files, around one
million. One service even has 3 data directories with that many files each.
You are utterly mad.
Apart from the human readability aspe
Laszlo Beres wrote:
Hi,
just a theoretical question: let's assume we have a cluster with GFS2
filesystem (not as a managed resource). What happens exactly if all
paths to backend device get lost?
GFS2 withdraws that filesystem and you'll have to reboot all the
withdrawn machines to get it ba
Colin Simpson wrote:
Probably not a cluster issue just pure kernel question. Sounds like the
driver or device is locked up and the driver or device is confused, so
the processes attached to it will be hung.
A common problem in a fabric environment is that there are 2+ paths to
the tapes (ie,
Colin Simpson wrote:
,when the service is stopped I get a "Stale NFS file handle" from
mounted filesystems accessing the NFS mount point at those times. i.e.
if I have a copy going I get on the service being disabled:
That's normal if a NFS server mount is unexported or nfsd shuts down.
It _s
On 12/08/2011 17:24, Paras pradhan wrote:
Does it mean that I don't need mpath0p1 ? If its the case i don't need
to run kpartx on mpath0?
You still need kpartx, but that's a bit clunky anyway. Let dm-multipath
take care of all that for you.
(The last time I used kpartx and friends was 2003.
On 12/08/2011 16:14, Paras pradhan wrote:
If the entire LUN is a PV then you don't need to partition it.
You mean don't use parted or any and directly proceed to pvcreate?
Correct.
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-
Paras pradhan wrote:
Hi,
I have a 2199GB LUN assigned to my 3 node cluster. Since its >2TB, I used
parted to create the EFI GPT parittion. After that pvcreate and vgcreate
were successfull but I get the following error when doing lvcreate.
If the entire LUN is a PV then you don't need to part
> Maybe I should try that again but the
only was I know to get a kdump is to set a large fence delay.
This is what I'd expect. We also found the fence delay has to be long
enough to allow the crashdump to be written out.
The only alternatives to speed this up are to use _very_ fast disk for
On 08/07/11 22:09, J. Bruce Fields wrote:
With default mount options, the linux NFS client (like most NFS clients)
assumes that a file has a most one writer at a time. (Applications that
need to do write-sharing over NFS need to use file locking.)
The problem is that file locking on V3 isn't
Colin Simpson wrote:
But I guess you are also telling me that file locking between the two
wouldn't be helping here either?
Correct.
NFSd (v2/3) doesn't pass client locks to the filesystem, nor does it
respect locks set by other processes.
It has a number of other foibles - try setting up
On Fri, 8 Jul 2011, Colin Simpson wrote:
> That's not ideal either when Samba isn't too happy working over NFS, and
> that is not recommended by the Samba people as being a sensible config.
I know but there's a real (and demonstrable) risk of data corruption for
NFS vs _anything_ if NFS clients a
On Fri, 8 Jul 2011, Steven Whitehouse wrote:
> Currently we don't recommend using NFS on a GFS2 filesystem which is
> also being used locally.
After much dealing with NFS internals, I would recommend NOT using it on
any filesystem where the files are accessed locally.
NFSv2/3 doesn't play nice w
On 09/06/11 15:46, Budai Laszlo wrote:
Hi,
What should be done in order to mount a gfs file system at boot?
I've created the following line in /etc/fstab:
/dev/clvg/gfsvol/mnt/testgfsgfs defaults0 0
but it is not mounting the fs at boot. If I run "mount -a" then
Alan Brown wrote:
This is interesting too. note the variation in extents (the file is a
piece of marketing fluff, name is unimportant)
I'm getting the same thing in sarch01 and that's mounted read-only by
the clients - there's zero write activity going on.
--
Linux-clust
This is interesting too. note the variation in extents (the file is a
piece of marketing fluff, name is unimportant)
$ df -h .
FilesystemSize Used Avail Use% Mounted on
/dev/mapper/VolGroupBeast03-LogVolUser1
250G 113G 138G 45% /stage/user1
$ ls -l SUMO-SA
Steven Whitehouse wrote:
The thing to check is what size the extents are...
filefrag doesn't show this.
the on-disk layout is
designed so that you should have a metadata block separating each data
extent at exactly the place where we would need to read a new metadata
block in order to contin
GFS2 seems horribly prone to fragmentation.
I have a filesystem which has been written to once (data archive,
migrated from a GFS1 filesystem to a clean GFS2 fs) and a lot of the
files are composed of hundreds of extents - most of these are only 1-2Mb
so this is a bit over the top and it badl
Digimer wrote:
With a two-node, quorum is effectively useless, as a single node is
allowed to continue.
That's what qdiskd is for. It's also useful in larger clusters.
Also, without proper fencing, things will not fail
properly. This means that you are in somewhat of an undefined area.
Un
Steven Whitehouse wrote:
Hi,
On Wed, 2011-05-18 at 16:14 +0100, Alan Brown wrote:
Bob, Steve, Dave,
Is there any progress on tuning the size of the tables (RHEL5) to allow
larger values and see if they help things as far as caching goes?
There is a bz open,
I thought so, but I can
Bob, Steve, Dave,
Is there any progress on tuning the size of the tables (RHEL5) to allow
larger values and see if they help things as far as caching goes?
It would be advantageous to tweak the dentry limits too - the kernel
limits this to 10% and attempts to increase are throttled back.
Th
On 13/05/11 23:21, Bob Peterson wrote:
| Steve/Bob, how about opening this one up for public view?
Sounds okay to me. Not sure how that's done, and not sure if I have
the right authority in bugzilla to do it.
I'm not entirely sure either but as the creator I think all you have to
do is unch
On 12/05/11 00:32, Ramiro Blanco wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=683155
Can't access that one: "You are not authorized to access bug #683155"
There's no reason this bug should be private, however it's addressed in
test kernel kernel-2.6.18-248.el5
Steve/Bob, how about op
Gordan Bobic wrote:
There is no such thing - period. On any OS. If your application is
single-process/single-thread, it will only scale vertically.
_If_ the problem is pleasantly or embarrassingly parallel a shell script
can be used to run many parallel invokations.
How to do that is offtop
Digimer wrote:
As soon as I define MTU=2000 (for example), then cman on one note will
start but not stop (the other node stops fine). Also, 'ccs_tool update
/etc/cluster/cluster.conf' fails with:
Have you configured the interfaces themselves to use jumbo frames?
Does the switch support jumb
David Hill wrote:
These directories are all on the same mount ... with a total size of 1.2TB!
I _strongly_ suggest you setup one filesystem per directory.
All files accessed by the application are within it's own folder/subdirectory.
No files is ever accessed by more than one node.
That wil
Bob, Steve et al,
Which EL test kernel post 2.6.18.247 is stable enough for use in a
production system for a few days?
I'm seeing massive slowdowns on lots of 2-100Mb writes (someone's
mirroring a ftp archive) and want to see if the .247 write speedups Bob
mentioned 3 weeks back will help.
David Hill wrote:
Hi Steve,
We seems to be experiencing some new issues now... With 4 nodes, only
one is slow but with 3 nodes, 2 of them are now slow.
2 nodes are doing 20k/s and one is doing 2mb/s ... Seems like all nodes will
end up with poor performances.
All nodes are locking fil
carlopmart wrote:
Hi all,
I have two rhel6.0 cluster nodes with five nic interfaces in each one.
Actually, I have one free interface without an IP in each one. Can I
assign a cluster service to this interface (service consists in one IP
and one script)??
Yes but
You will need to c
Nicolas Ross wrote:
It was a large, very large directory, with somewhere neer one million
small files, so the rsync took something like 3 to 4 hours. At some
point, all nodes' consoles dispalyed this :
gfs2_quotad:2498 blocked for more that 120 seconds.
"echo 0 > /proc/sys/kernel/hang_task_ti
to reduce data corruption risk.
Martijn
On Fri, Mar 18, 2011 at 2:18 PM, Alan Brown wrote:
Martijn Storck wrote:
Is this expected behaviour?
Yes.
Is there anything we can do to reduce these delays?
Unmount all clustered filesystems on the host before rebooting.
AB
--
Linux-cluster ma
Martijn Storck wrote:
Is this expected behaviour?
Yes.
Is there anything we can do to reduce these delays?
Unmount all clustered filesystems on the host before rebooting.
AB
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
Jack Duston wrote:
> Thanks Yue, but your information would seem dated if this site is correct:
>
> http://www.redhat.com/rhel/compare
>
> Even if 100TB is what's officially supported in RHEL6, it doesn't mean
> that larger file systems won't work.
Anyone considering such large filesystems shoul
Bob:
You say this in your best practice document:
"Our performance testing lab has experimented with various resource
group sizes and found a performance problem with anything bigger than
768MB. Until this is properly diagnosed, we recommend staying below 768MB."
What are the details? Nearly
On 12/03/11 23:13, Bob Peterson wrote:
Agreed. We're abundantly aware of the performance problems,
and we're not ignoring them.
I know Bob, thanks.
(1) We recently found and fixed a problem that caused the
dlm to pass locking traffic much slower than possible.
Is this rolled into 2.6.
I missed somthing:
On 12/03/11 17:46, Jeff Sturm wrote:
As an example, while running a "du" command on my GFS mount point, I
observed the Ethernet traffic peak:
12:20:33 PM IFACE rxpck/s txpck/s rxbyt/s txbyt/s
rxcmp/s txcmp/s rxmcst/s
12:20:38 PM eth0 3517
On 12/03/11 17:46, Jeff Sturm wrote:
[root@cluster1 76]# ls | wc -l
1970
The key is that only a few locks are needed to list the directory:
You assume NFS clients are simply using "ls"
Running "ls -l" on the same directory takes a bit longer (by a factor of
about 20):
Or
The only reliable way I have found (rhel4 and 5) is this:
1: Migrate all services off the node.
2: Unmount as many GFS disks as possible.
3: Power cycle the node.
The other nodes will recover quickly.
"cman leave (remove) (force)" sometimes works but often doesn't.
--
Linux-cluster maili
On 09/03/11 14:13, yue wrote:
which is better gfs2 and ocfs2?
i want to share fc-san, do you know which is better?
"that depends" - it is highly dependent on the type of disk activity you
are performing.
There are various reviews of both FSes circulating.
Personal observation: GFS and GFS2
On 08/03/11 17:11, Valeriu Mutu wrote:
Hi,
I think the problem is solved. I was using a 9000bytes MTU on the Xen virtual
machines' iSCSI interface. Switching back to 1500bytes MTU caused the clvmd to
start working.
As long as everything on the network is 9000bytes then you should be ok.
RH'
Nikola Savic wrote:
Rsync is very slow in creating file
list, little faster than 100files/s.
That's about what I see too. Ditto on reading.
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
On 25/02/11 13:43, Bob Peterson wrote:
All of the fixes going into RHEL5.7 are in that version, and
it is faster and more accurate than the version shipped with RHEL5.6.
Will it be backported to 5.6?
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listi
On 24/02/11 17:50, Steven Whitehouse wrote:
Depending on the exact mix of I/O, that is expected behaviour. That is
why it is so important to look at what can be done at the application
layer to mitigate such problems.
This is an academic environment.
Telling users to adjust the way they do t
On 25/02/11 07:21, Martijn Storck wrote:
Thanks for your message.
Somehow the issue has not returned since yesterday when I applied some
tuning to our GFS, specifically:
glock_purge 50
demote_secs 100
scand_secs 5
statfs_fast 1
It's most likely the biggest contributor is the statfs_fast se
On 24/02/11 22:40, Scooter Morris wrote:
Hi all. After two tries, we've modified our cluster so that all nodes
have increased their dlm hash table sizes to 1024. Initially, I put
the echos in /etc/init.d/gfs2, but it turns out that /etc/init.d/gfs2
is sort of a no-op: /etc/init.d/netfs mounts
Steven Whitehouse wrote:
As soon as you mix creation/deletion on one node with accesses (of
whatever kind) from other nodes, you run this risk.
_ALL_ the GFS2 filesystems (bar one 5Gb one for common config files,
etc) are mounted one-node-only.
_ALL_ the GFS2 filesystems (with the same exc
If multiple NFS services are defined, a race condition exists with
parallel invokations of /usr/share/cluster/nfsclient.sh
exportfs in add/remove mode reads the existing exports in from kernel
(or/etab/xtab), applies the command and then writes a _complete_
exportlist back to the kernel, not
Steven Whitehouse wrote:
That doesn't sound like it is related to a DLM issue. 150 entries is not
a lot.
It isn't, but when the machine's being hammered by requests in other
filesystems, things can get very slow, very quickly.
What do you mean be "access" in this case? Just looking up a
si
After running several days with the larger table sizes I don't think
it's made any difference to individual thread performance or overall
throughput.
Likewise, the following changes have had no effect on access time for
large directories (but they have improved caching and improved high load
David Teigland wrote:
Don't change the buffer size, but I'd increase all the hash table sizes to
4096 and see if anything changes.
echo "4096" > /sys/kernel/config/dlm/cluster/rsbtbl_size
echo "4096" > /sys/kernel/config/dlm/cluster/lkbtbl_size
echo "4096" > /sys/kernel/config/dlm/cluster/dirtb
> Yes, ls -l will always take longer because it is not just accessing
the directory, but also every inode in the directory. As a result the
I/O pattern will generally be poor.
I know and accept that. It's common to most filesystems but the access
time is particularly pronounced with GFS2 (pres
> For the GFS2 glocks, that doesn't matter - all of the glocks are held
in a single hash table no matter how many filesystems there are.
Given nearly 4 mlllion glocks currently on one of the boxes in a quiet
state (and nearly 6 million if everything was on one node), is the
existing hash table
> Directories of the size (number of entries) which you have indicated
should not be causing a problem as lookup should still be quite fast at
that scale.
Perhaps, but even so 4000 file directories usually take over a minute to
"ls -l" , while 85k file/directories take 5 mins (20-40 mins on a ba
> A faster way to just grab lock numbers is to grep for gfs2
in /proc/slabinfo as that will show how many are allocated at any one
time.
True, but it doesn't show mow many are used per fs.
FWIW, here are current stats on each cluster node (it's evening and
lightly loaded)
gfs2_quotad
Steve:
To add some interest (and give you numbers to work with as far as dlm
config tuning goes), here are a selection of real world lock figures
from our file cluster (cat $d | wc -l)
/sys/kernel/debug/dlm/WwwHome-gfs2_locks 162299 (webserver exports)
/sys/kernel/debug/dlm/soft2-gfs2_locks
> You can set it via the configfs interface:
Given 24Gb ram, 100 filesystems, several hundred million of files and
the usual user habits of trying to put 100k files in a directory:
Is 24Gb enough or should I add more memory? (96Gb is easy, beyond that
is harder)
What would you consider safe
> There is a config option to increase the resource table size though,
so perhaps you could try that?
..details?
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
I'm seeing heartbeat/lock lan traffic peak out at about 120kb/s and
4000pps per node at the moment. Clearly the switch isn't the problem -
and using hardware acclerated igb devices I'm pretty sure the
networking's fine too.
During the actual workload, or just during the ping pong test?
Duri
I'm documenting this in case anyone else gets bitten
(This is supposed to have been fixed since October, but we encountered
it in the last few days on RHEL5.6 - either it's not fully fixed or the
patch has fallen out of the production kernel)
We kept getting GFS and GFS2 filesystems mysteriou
> It would be really interesting how long the described backup takes
when the gfs2 filesystem is mounted exclusively on one node without locking.
The 2 million inode system backs up in about 30 minutes when mounted
lock_nolock (0 file incremental backup using bacula)
> For me it looks like
The setup described is all on RHEL5.6.
Fileserver filesystems are each mounted on one cluster node only
(scattered across nodes) and then NFS exported as individual services
for portability. (That exposed a major race condition with exportfs as
it's not parallel aware in any way, shape or fo
After lots of headbanging, I'm slowly realising that limits on GFS2 lock
rates and totem message passing appears to be the main inhibitor of
cluster performance.
Even on disks which are only mounted on one node (using lock_dlm), the
ping_pong rate is - quite frankly - appalling, at about 500
88 matches
Mail list logo