Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Frank Sonntag
Hi Greg, Try using the same server on both machines when mounting, instead of mounting off the local gluster server on both. I've used the same approach like you in the past and got into all kinds of split-brain problems. The drawback of course is that mounts will fail if the machine you chose

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Justin Clift
Hi guys, As an FYI, from discussion on gluster-devel IRC yesterday, the RDMA code still isn't in a good enough state for production usage with 3.4.0. :( There are still outstanding bugs with it, and I'm working to make the Gluster Test Framework able to work with RDMA so we can help shake out

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Brian Candler
On 10/07/2013 06:26, Greg Scott wrote: Bummer. Looks like I'm on my own with this one. I'm afraid this is the problem with gluster: everything works great on the happy path, but as soon as anything goes wrong, you're stuffed. There is neither recovery procedure documentation, nor detailled

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Rejy M Cyriac
On 07/10/2013 11:38 AM, Frank Sonntag wrote: Hi Greg, Try using the same server on both machines when mounting, instead of mounting off the local gluster server on both. I've used the same approach like you in the past and got into all kinds of split-brain problems. The drawback of

Re: [Gluster-users] Gluster Self Heal

2013-07-10 Thread Toby Corkindale
On 09/07/13 18:17, 符永涛 wrote: Hi Toby, What's the bug #? I want to have a look and backport it to our production server if it helps. Thank you. I think it was this one: https://bugzilla.redhat.com/show_bug.cgi?id=947824 The bug being that the daemons were crashing out if you had a lot of

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Frank Sonntag
On 10/07/2013, at 7:59 PM, Rejy M Cyriac wrote: On 07/10/2013 11:38 AM, Frank Sonntag wrote: Hi Greg, Try using the same server on both machines when mounting, instead of mounting off the local gluster server on both. I've used the same approach like you in the past and got into all kinds

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread raghav
On 07/09/2013 06:47 AM, Greg Scott wrote I don't get this. I have a replicated volume and 2 nodes. My challenge is, when I take one node offline, the other node can no longer access the volume until both nodes are back online again. Details: I have 2 nodes, fw1 and fw2. Each node has an XFS

[Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Allan Latham
Hi all Thanks to all those volunteers who are working to get gluster into a state where it can be used for live work. I understand that you are giving your free time and I very much appreciate it on this project and the many others we use for live production work. There seems to be a problem

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Greg Scott
Brian, I'm not ready to give up just yet. From Rejy: Would not the mount option 'backupvolfile-server=secondary server help at mount time, in the case of the primary server not being available ? Hmmm - this seems to be a step in the right direction. On both nodes I did: umount

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Greg Scott
Oh wow, it sounds like we both have similar issues. Surely there is a key somewhere to making these simple cases work. Otherwise, how would some of the big organizations using this stuff continue with it? - Greg ___ Gluster-users mailing list

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Jeff Darcy
On 07/10/2013 07:01 AM, Allan Latham wrote: I have a simple scenario and it just simply doesn't work. Reading over the network when the file is available locally is plainly wrong. Our application cannot take the performance hit nor the extra network traffic. Another victim of our release

Re: [Gluster-users] Gluster Self Heal

2013-07-10 Thread Vijay Bellur
On 07/10/2013 01:31 PM, Toby Corkindale wrote: On 09/07/13 18:17, 符永涛 wrote: Hi Toby, What's the bug #? I want to have a look and backport it to our production server if it helps. Thank you. I think it was this one: https://bugzilla.redhat.com/show_bug.cgi?id=947824 The bug being that the

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Allan Latham
Hi Jeff Thanks for the reply and all the great work you are doing. I know how hard it is - believe me. Where do I get a version that will solve my 'read local if we have the file here' problem. My use case is exactly two servers at a server farm with 100Mbit between them. This 100Mbit is also

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Jeff Darcy
On 07/10/2013 09:20 AM, Allan Latham wrote: Where do I get a version that will solve my 'read local if we have the file here' problem. I would say 3.4 is already far better than 3.3 not only in terms of features but stability/maintainability/etc. as well, even though it's technically not out

Re: [Gluster-users] Gluster Self Heal

2013-07-10 Thread HL
On 10/07/2013 04:05 μμ, Vijay Bellur wrote: A lot of volumes or a lot of delta to self-heal could trigger this crash. 3.3.2 containing this fix should be out real soon now. Appreciate your patience in this regard. Thanks, Vijay I hope this update will reach the debian wheezy repo.

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Allan Latham
Hi Jeff OK - I've downloaded the source and I'm setting up to compile it for Debian Wheezy. I'll let you know how I get on. Maybe next week before I can run preliminary tests. Correct me if I'm wrong but geo-replication is master/slave? We could maybe go with this in some scenarios as updates

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Marcus Bointon
On 10 Jul 2013, at 17:11, Allan Latham alat...@flexsys-group.de wrote: In short 'sync' replication is not an absolute must but we do use master/master quite a bit. That's why I'm using gluster too. I'm running web servers that allow uploads, and if you're going to maintain a no-stickiness

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread HL
Been there ... here is my 10cent advise a) Prepare for tomorrow b) Rest c) Think d) Plan e) act I am sure it will work for you when calmed Tech hints. ifconfig iface mtu 9000 or whatever your nic can afford Having a 100Mbit is not a good idea. I 've recently located a dual port 1Gbit nic on

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Jeff Darcy
On 07/10/2013 11:11 AM, Allan Latham wrote: Correct me if I'm wrong but geo-replication is master/slave? It is, today. Multi-way is under development, but by its nature won't ever have the same consistency guarantees as synchronous replication. ___

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Brian Candler
On 10/07/2013 13:58, Jeff Darcy wrote: 2d. it needs a fast validation scanner which verifies that data is where it should be and is identical everywhere (md5sum). How fast is fast? What would be an acceptable time for such a scan on a volume containing (let's say) ten million files? What

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Joe Julian
My minimal donation: On 07/10/2013 04:01 AM, Allan Latham wrote: There seems to be a problem with the way gluster is going. For me it would be an ideal solution if it actually worked. Actually working is always the ideal. Actually working for all possible use cases... may be a little more

Re: [Gluster-users] fuse 3.3.1 fails/crashes on flush on Distributed-Striped-Replicate volume

2013-07-10 Thread Kushnir, Michael (NIH/NLM/LHC) [C]
I had the same problem with striped-replicated. https://bugzilla.redhat.com/show_bug.cgi?id=861423 Best, Michael -Original Message- From: Benedikt Fraunhofer [mailto:benedikt.fraunhofer.l.gluster.fxy-3zz-...@traced.net] Sent: Monday, July 08, 2013 3:43 AM To:

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Matthew Nicholson
Well, first of all,thank for the responses. The volume WAS failing over the tcp just as predicted,though WHY is unclear as the fabric is know working (has about 28K compute cores on it all doing heavy MPI testing on it), and the OFED/verbs stack is consistent across all client/storage systems

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Joe Landman
On 07/10/2013 02:36 PM, Joe Julian wrote: 1) http://www.solarflare.com makes sub microsecond latency adapters that can utilize a userspace driver pinned to the cpu doing the request eliminating a context switch We've used open-onload in the past on Solarflare hardware. And with GlusterFS.

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Justin Clift
On 10/07/2013, at 7:49 PM, Matthew Nicholson wrote: Well, first of all,thank for the responses. The volume WAS failing over the tcp just as predicted,though WHY is unclear as the fabric is know working (has about 28K compute cores on it all doing heavy MPI testing on it), and the OFED/verbs

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Matthew Nicholson
justin, yeah, this fabirc is all bran new mellanox, and all nodes are running their v2 stack. of for a beg report, sure thing. I was thinking i would tack on a comment here: https://bugzilla.redhat.com/show_bug.cgi?id=982757 since thats about the silent failure. -- Matthew Nicholson Research

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Justin Clift
On 10/07/2013, at 8:05 PM, Matthew Nicholson wrote: justin, yeah, this fabirc is all bran new mellanox, and all nodes are running their v2 stack. Cool. The only thing that worries me about the v2 stack, is they've dropped SDP support. SDP seemed to have limited scope (speeding up IPoIB),

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Joe Julian
On 07/10/2013 11:51 AM, Joe Landman wrote: On 07/10/2013 02:36 PM, Joe Julian wrote: 1) http://www.solarflare.com makes sub microsecond latency adapters that can utilize a userspace driver pinned to the cpu doing the request eliminating a context switch We've used open-onload in the past on

Re: [Gluster-users] Giving up [ was: Re: read-subvolume]

2013-07-10 Thread Joe Landman
On 07/10/2013 03:18 PM, Joe Julian wrote: The small file complaint is all about latency though. There's very little disk overhead (all inode lookups) to doing a self-heal check. ls -l on a 50k file directory and nearly all the delay is from network RTT for self-heal checks (check that with

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Ryan Aydelott
How many nodes make up that volume that you were using for testing? Over 100 nodes running at QDR/IPoIB using 100 threads we we ran around 60GB/s read and somewhere in the 40GB/s for writes (iirc). On Jul 10, 2013, at 1:49 PM, Matthew Nicholson matthew_nichol...@harvard.edu wrote: Well,

Re: [Gluster-users] tips/nest practices for gluster rdma?

2013-07-10 Thread Matthew Nicholson
Ryan, 10(storage) nodes, I did some test w 1 brick per node, and another round w/ 4 per node. Each is FDR connected, but all on the same switch. I'd love to hear about your setup, gluster version, OFED stack etc -- Matthew Nicholson Research Computing Specialist Harvard FAS Research

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Greg Scott
It looks like the brick processes on fw2 machine are not running and hence when fw1 is down, the entire replication process is stalled. can u do a ps and get the status of all the gluster processes and ensure that the brick process is up on fw2. I was away from this most

[Gluster-users] unable to resolve brick error

2013-07-10 Thread Matthew Sacks
Hello, I have a gluster cluster which keeps complaining about ops.c:842:glusterd_op_stage_start_volume] 0-: Unable to resolve brick gluster1:/export/brick1/sdb1 here is the full output : https://gist.github.com/msacks/5970713 Not sure how this happened or how to fix it. All my peers are

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

2013-07-10 Thread Greg Scott
And here is ps ax | grep gluster from both nodes when fw1 is offline. Note I have it mounted right now with the 'backupvolfile-server=secondary server mount option. The ps ax | grep gluster output looks the same now as it did when both nodes were online. From fw1: [root@chicago-fw1 gregs]#

Re: [Gluster-users] unable to resolve brick error

2013-07-10 Thread Matthew Sacks
Here is the startup sequence: https://gist.github.com/msacks/5971418 On Wed, Jul 10, 2013 at 3:02 PM, Matthew Sacks msacksda...@gmail.comwrote: Hello, I have a gluster cluster which keeps complaining about ops.c:842:glusterd_op_stage_start_volume] 0-: Unable to resolve brick

Re: [Gluster-users] unable to resolve brick error

2013-07-10 Thread Todd Stansell
Check out https://bugzilla.redhat.com/show_bug.cgi?id=911290 It seems similar so hopefully it'll help... Todd On Wed, Jul 10, 2013 at 05:18:46PM -0700, Matthew Sacks wrote: Here is the startup sequence: https://gist.github.com/msacks/5971418 On Wed, Jul 10, 2013 at 3:02 PM, Matthew Sacks

Re: [Gluster-users] unable to resolve brick error

2013-07-10 Thread Joe Julian
That error means (and if it means this, then why doesn't it just say this???) that the hostname provided could not be converted to its uuid. That probably means that the hostname assigned to the brick is not in the peer list. The hostname of the brick has to be a case insensitive match for

Re: [Gluster-users] unable to resolve brick error

2013-07-10 Thread Joe Julian
By the way, this less than useful error message has been reworked for 3.4. On 07/10/2013 05:54 PM, Joe Julian wrote: That error means (and if it means this, then why doesn't it just say this???) that the hostname provided could not be converted to its uuid. That probably means that the

[Gluster-users] Not giving up [ was: Re: read-subvolume]

2013-07-10 Thread Allan Latham
Hi all - especially Jeff, Marcus and HL I couldn't resist a quick test after compiling 3.4 beta. Looks good. Same (very quick) times to do md5sums on both servers so it must be doing local reads. So gluster is still in the running. I repeat - you guys are doing a great job. Software like gluster