Re: [Gluster-users] NFS crashes - bug 1010241

2014-11-20 Thread Shawn Heisey
On 11/20/2014 5:51 AM, Niels de Vos wrote: > Do you have a bug for this against the 3.4 version? If not, please file > one and I'll post the NFS change for inclusion. > > Note that 3.4.2 does not get any updates, you would need to use the 3.4 > stable release series, currently at 3.4.6. I've file

Re: [Gluster-users] NFS crashes - bug 1010241

2014-11-19 Thread Shawn Heisey
On 11/19/2014 6:53 PM, Ravishankar N wrote: > Heterogeneous op-version cluster is not supported. You would need to upgrade > all servers. > > http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5 I would be running 3.4.2 bricks with a later 3.4.x release on the NFS peers, not d

[Gluster-users] NFS crashes - bug 1010241

2014-11-19 Thread Shawn Heisey
We are running into this crash stacktrace on 3.4.2. https://bugzilla.redhat.com/show_bug.cgi?id=1010241 The NFS process dies with no predictability. I've written a shell script that detects the crash and runs a process to completely kill all gluster processes and restart glusterd, which has elim

Re: [Gluster-users] shrinking a volume

2014-10-28 Thread Shawn Heisey
On 10/28/2014 1:51 PM, John G. Heim wrote: > I want to shrink a gluster 3.2 volume. I've been reading the > documentation at: > http://gluster.org/community/documentation/index.php/Gluster_3.2:_Shrinking_Volumes > > > Something that is unclear on this document is whether I will lose data > if I f

Re: [Gluster-users] Problems with .gluster structure - bad symlinks

2014-03-10 Thread Shawn Heisey
On 3/9/2014 10:39 AM, Shawn Heisey wrote: > On 3/8/2014 7:45 PM, Shawn Heisey wrote: >> cat: >> /bricks/d00v00/mdfs/.glusterfs/65/30/6530ce82-310d-4c7c-8d14-135655328a77: >> Too many levels of symbolic links >> >> What do I need to do to fix this problem? Is the

Re: [Gluster-users] Problems with .gluster structure - bad symlinks

2014-03-09 Thread Shawn Heisey
On 3/8/2014 7:45 PM, Shawn Heisey wrote: > cat: > /bricks/d00v00/mdfs/.glusterfs/65/30/6530ce82-310d-4c7c-8d14-135655328a77: > Too many levels of symbolic links > > What do I need to do to fix this problem? Is there something I can do > for each of the bad symlinks? Woul

[Gluster-users] Problems with .gluster structure - bad symlinks

2014-03-08 Thread Shawn Heisey
Some background: On version 3.3.1, we tried to rebalance after adding storage. It blew up badly due to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=859387 We have now upgraded to 3.4.2. A new rebalance attempt resulted in a several dozen entries showing up in the 'gluster volume

Re: [Gluster-users] Is there a way to manually clear the heal-failed/split-brain lists?

2014-03-06 Thread Shawn Heisey
On 3/6/2014 2:14 PM, Michael Peek wrote: > I've noticed that once I've taken care of a problem, the heal-failed and > split-brain lists don't get smaller or go away. Is there a way to > manually reset them? I'd like to know the answer to that question too. There is a bug filed on the problem alr

Re: [Gluster-users] Fixing heal / split-brain when the entry is a directory

2014-03-05 Thread Shawn Heisey
> From my short Gluster experience I noticed that during fix-layout when > adding new bricks, re-creates the directories on the new bricks. Could > yo maybe try to fix-layout, possibly after, you change the trusted > xattrs? Or try some combinatinos of that.. > > also I assume you do > gluster vol

Re: [Gluster-users] Fixing heal / split-brain when the entry is a directory

2014-03-04 Thread Shawn Heisey
On 3/4/2014 5:20 PM, Viktor Villafuerte wrote: You may have tried this already.. but what if you leave both trusted.afr entries, change only one to '0' and then self-heal? The lack of a Reply-To header on some lists always trips me up. I end up just replying to the sender. Setting one entry

[Gluster-users] Fixing heal / split-brain when the entry is a directory

2014-03-04 Thread Shawn Heisey
I have a bunch of heal problems on a volume. For this email, I won't speculate about what caused them - that's a whole other discussion that I may have at some point in the future. This will concentrate on fixing the immediate problems so I can move forward. Thanks to JoeJulian's blog posts

Re: [Gluster-users] Failed rebalance - lost files, inaccessible files, permission issues

2013-11-26 Thread Shawn Heisey
Here's what our own developer had to say about this: On 11/8/2013 8:23 PM, Shawn Heisey wrote: When I looked at the individual cases of lost or corrupted files, one thing kept staring at me in the face until I recognized it: [2013-11-02 03:56:36.472170] I [dht-rebalance.c:647:dht_migrate

Re: [Gluster-users] Failed rebalance - lost files, inaccessible files, permission issues

2013-11-12 Thread Shawn Heisey
On 11/9/2013 2:39 AM, Shawn Heisey wrote: They are from the same log file - the one that I put on my dropbox account and linked in the original message. They are consecutive log entries. Further info from our developer that is looking deeper into these problems: Ouch. I know

Re: [Gluster-users] Failed rebalance resulting in major problems

2013-11-11 Thread Shawn Heisey
On 11/11/2013 12:33 PM, Jeff Darcy wrote: There's nothing about a split-network configuration like yours that would cause something like this *by itself*, but anything that creates greater complexity also creates new possibilities for something to go wrong. Just to be safe, if I were you, I'd do

Re: [Gluster-users] Failed rebalance resulting in major problems

2013-11-11 Thread Shawn Heisey
On 11/6/2013 1:15 PM, Joe Julian wrote: I'm one of oldest GlusterFS users around here and one of the biggest proponents and even I have been loath to rebalance until 3.4.1. I wish that you'd said this when I was in the IRC channel asking for opinions about whether to upgrade before adding stor

Re: [Gluster-users] Failed rebalance - lost files, inaccessible files, permission issues

2013-11-09 Thread Shawn Heisey
On 11/9/2013 1:47 AM, Anand Avati wrote: > Thanks for the detailed info. I have not yet looked into your logs, but > will do so soon. There have been patches on rebalance which do fix > issues related to ownership. But I am not (yet) sure about bugs which > caused data loss. One question I have is

[Gluster-users] Failed rebalance - lost files, inaccessible files, permission issues

2013-11-08 Thread Shawn Heisey
I'm starting a new thread on this, because I have more concrete information than I did the first time around. The full rebalance log from the machine where I started the rebalance can be found at the following link. It is slightly redacted - one search/replace was made to replace an identifyi

Re: [Gluster-users] Failed rebalance resulting in major problems

2013-11-07 Thread Shawn Heisey
(resending because my reply only went to Lukáš) On 11/7/2013 3:20 AM, Lukáš Bezdička wrote: I strongly suggest not using 3.3.1 or whole 3.3 branch. I would only go for 3.4.1 on something close to production and even there I wouldn't yet use rebalance/shrinking. We give gluster heavy testing be

[Gluster-users] Failed rebalance resulting in major problems

2013-11-05 Thread Shawn Heisey
We recently added storage servers to our gluster install, running 3.3.1 on CentOS 6. It went from 40TB usable (8x2 distribute-replicate) to 80TB usable (16x2). There was a little bit over 20TB used space on the volume. The add-brick went through without incident, but the rebalance failed after m

Re: [Gluster-users] UFO - new version capable of S3 API?

2013-03-05 Thread Shawn Heisey
On 3/5/2013 1:27 PM, Peter Portante wrote:> If I am not mistaken, as of OpenStack Folsom (1.7.4), the s3 > compatibility middleware was broken out into its own project, > found here: https://github.com/fujita/swift3 > > You should be able to install that and configure as before, be sure > to read

[Gluster-users] UFO - new version capable of S3 API?

2013-03-05 Thread Shawn Heisey
I was running the previous version of UFO, the 3.3 one that was based on Swift 1.4.8. Now there is a 3.3.1 based on Swift 1.7.4. The config that I used last time to enable S3 isn't working with the new one, just updated yesterday using yum. I was using tempauth in the old version and I'm sti

Re: [Gluster-users] New version of UFO - is there a new HOWTO?

2013-02-01 Thread Shawn Heisey
On 1/31/2013 3:23 PM, Shawn Heisey wrote:> On 1/31/2013 12:36 PM, Kaleb Keithley wrote: >> Not sure if you saw this in #gluster on IRC. >> >> The other work-around for F18 is to delete >> /etc/swift/{account,container,object}-server.conf before starting UFO. >>

Re: [Gluster-users] New version of UFO - is there a new HOWTO?

2013-01-31 Thread Shawn Heisey
On 1/31/2013 12:36 PM, Kaleb Keithley wrote: Not sure if you saw this in #gluster on IRC. The other work-around for F18 is to delete /etc/swift/{account,container,object}-server.conf before starting UFO. With that my UFO set-up works as it did in F17. Still no joy. http://fpaste.org/SOYM/

Re: [Gluster-users] New version of UFO - is there a new HOWTO?

2013-01-29 Thread Shawn Heisey
the -ufo package have the config file with tempauth? Thanks, Shawn On 1/29/2013 8:43 PM, Kaleb Keithley wrote: 3.3.1-8 in f18 still uses tempauth. - Original Message - From: "Shawn Heisey" To: gluster-users@gluster.org Sent: Tuesday, January 29, 2013 8:21:11 PM Subject: [Glu

[Gluster-users] New version of UFO - is there a new HOWTO?

2013-01-29 Thread Shawn Heisey
I just installed glusterfs-swift 3.3.1 on a couple of Fedora 18 servers. This is based on swift 1.7.4 and has keystone in the config. I had experimented with the one based on swift 1.4.8 and tempauth and had some problems with it. The HOWTO I can find is still for the old one. Is there an u

Re: [Gluster-users] how well will this work

2013-01-06 Thread Shawn Heisey
On 1/2/2013 4:01 AM, Brian Candler wrote: Aside: what is the reason for creating four multiple logical volumes/bricks on the same node, and then combining them together using gluster distribution? Also, why are you combining all your disks into a single volume group (clustervg), but then allocat

Re: [Gluster-users] infiniband replicated distributed setup.--- network setup question...

2012-12-18 Thread Shawn Heisey
On 12/18/2012 7:33 PM, Matthew Temple wrote: From any of those nodes, peer status is wrong: For instance, if I ssh over to 2-ib, and ask for a peer status, it shows the peers to to be 1-ib-r 2-ib-r and *155.52.48.1* (the last is the ethernet side of, not the IB side) I guess my question is this

[Gluster-users] Gluster and public/private LAN

2012-12-18 Thread Shawn Heisey
I have an idea I'd like to run past everyone. Every gluster peer would have two NICs - one "public" and the other "private" with different IP subnets. The idea that I am proposing would be to have every gluster peer have all private peer addresses in /etc/hosts, but the public addresses would

Re: [Gluster-users] Inviting comments on my plans

2012-11-19 Thread Shawn Heisey
On 11/19/2012 3:18 AM, Fernando Frediani (Qube) wrote: Hi, I agree with the comment about Fedora and wouldn't choose it a distribution, but if you are comfortable with it go ahead as I don't think this will be the major pain. RAID: I see where you are coming from to choose not have any RAID a

Re: [Gluster-users] Inviting comments on my plans

2012-11-18 Thread Shawn Heisey
On 11/18/2012 5:19 AM, Brian Candler wrote: On Sat, Nov 17, 2012 at 11:04:33AM -0700, Shawn Heisey wrote: Dell R720xd servers with two internal OS drives and 12 hot-swap external 3.5 inch bays. Fedora 18 alpha, to be upgraded to Fedora 18 when it is released. I would strongly recommend

[Gluster-users] Inviting comments on my plans

2012-11-17 Thread Shawn Heisey
I am planning the following new gluster 3.3.1 deployment, please let me know whether I should rethink any of my plans. If you don't think what I'm planning is a good idea, I will need concrete reasons. Dell R720xd servers with two internal OS drives and 12 hot-swap external 3.5 inch bays. Fe

Re: [Gluster-users] how to shrink a volume?

2012-11-12 Thread Shawn Heisey
On 11/9/2012 7:51 AM, Anselm Strauss wrote: ah, i just noticed i tested this on gluster 3.2 not 3.3. also in the changelog to 3.3 it says that remove-brick now migrates data to remaining bricks. If your 3.3 volume is near or over half full, the migration is likely to completely fill up your vo