Hi Greg,
Try using the same server on both machines when mounting, instead of mounting
off the local gluster server on both.
I've used the same approach like you in the past and got into all kinds of
split-brain problems.
The drawback of course is that mounts will fail if the machine you chose
Hi guys,
As an FYI, from discussion on gluster-devel IRC yesterday, the RDMA code
still isn't in a good enough state for production usage with 3.4.0. :(
There are still outstanding bugs with it, and I'm working to make the
Gluster Test Framework able to work with RDMA so we can help shake out
On 10/07/2013 06:26, Greg Scott wrote:
Bummer. Looks like I'm on my own with this one.
I'm afraid this is the problem with gluster: everything works great on
the happy path, but as soon as anything goes wrong, you're stuffed.
There is neither recovery procedure documentation, nor detailled
On 07/10/2013 11:38 AM, Frank Sonntag wrote:
Hi Greg,
Try using the same server on both machines when mounting, instead of mounting
off the local gluster server on both.
I've used the same approach like you in the past and got into all kinds of
split-brain problems.
The drawback of
On 09/07/13 18:17, 符永涛 wrote:
Hi Toby,
What's the bug #? I want to have a look and backport it to our
production server if it helps. Thank you.
I think it was this one:
https://bugzilla.redhat.com/show_bug.cgi?id=947824
The bug being that the daemons were crashing out if you had a lot of
On 10/07/2013, at 7:59 PM, Rejy M Cyriac wrote:
On 07/10/2013 11:38 AM, Frank Sonntag wrote:
Hi Greg,
Try using the same server on both machines when mounting, instead of
mounting off the local gluster server on both.
I've used the same approach like you in the past and got into all kinds
On 07/09/2013 06:47 AM, Greg Scott wrote
I don't get this. I have a replicated volume and 2 nodes. My
challenge is, when I take one node offline, the other node can no
longer access the volume until both nodes are back online again.
Details:
I have 2 nodes, fw1 and fw2. Each node has an XFS
Hi all
Thanks to all those volunteers who are working to get gluster into a
state where it can be used for live work.
I understand that you are giving your free time and I very much
appreciate it on this project and the many others we use for live
production work.
There seems to be a problem
Brian, I'm not ready to give up just yet.
From Rejy:
Would not the mount option 'backupvolfile-server=secondary server help
at mount time, in the case of the primary server not being available ?
Hmmm - this seems to be a step in the right direction. On both nodes I did:
umount
Oh wow, it sounds like we both have similar issues. Surely there is a key
somewhere to making these simple cases work. Otherwise, how would some of the
big organizations using this stuff continue with it?
- Greg
___
Gluster-users mailing list
On 07/10/2013 07:01 AM, Allan Latham wrote:
I have a simple scenario and it just simply doesn't work. Reading over
the network when the file is available locally is plainly wrong. Our
application cannot take the performance hit nor the extra network traffic.
Another victim of our release
On 07/10/2013 01:31 PM, Toby Corkindale wrote:
On 09/07/13 18:17, 符永涛 wrote:
Hi Toby,
What's the bug #? I want to have a look and backport it to our
production server if it helps. Thank you.
I think it was this one:
https://bugzilla.redhat.com/show_bug.cgi?id=947824
The bug being that the
Hi Jeff
Thanks for the reply and all the great work you are doing. I know how
hard it is - believe me.
Where do I get a version that will solve my 'read local if we have the
file here' problem.
My use case is exactly two servers at a server farm with 100Mbit between
them. This 100Mbit is also
On 07/10/2013 09:20 AM, Allan Latham wrote:
Where do I get a version that will solve my 'read local if we have the
file here' problem.
I would say 3.4 is already far better than 3.3 not only in terms of features
but stability/maintainability/etc. as well, even though it's technically not
out
On 10/07/2013 04:05 μμ, Vijay Bellur wrote:
A lot of volumes or a lot of delta to self-heal could trigger this crash.
3.3.2 containing this fix should be out real soon now. Appreciate
your patience in this regard.
Thanks,
Vijay
I hope this update will reach the debian wheezy repo.
Hi Jeff
OK - I've downloaded the source and I'm setting up to compile it for
Debian Wheezy.
I'll let you know how I get on. Maybe next week before I can run
preliminary tests.
Correct me if I'm wrong but geo-replication is master/slave?
We could maybe go with this in some scenarios as updates
On 10 Jul 2013, at 17:11, Allan Latham alat...@flexsys-group.de wrote:
In short 'sync' replication is not an absolute must but we do use
master/master quite a bit.
That's why I'm using gluster too. I'm running web servers that allow uploads,
and if you're going to maintain a no-stickiness
Been there ...
here is my 10cent advise
a) Prepare for tomorrow
b) Rest
c) Think
d) Plan
e) act
I am sure it will work for you when calmed
Tech hints.
ifconfig iface mtu 9000 or whatever your nic can afford
Having a 100Mbit is not a good idea.
I 've recently located a dual port 1Gbit nic on
On 07/10/2013 11:11 AM, Allan Latham wrote:
Correct me if I'm wrong but geo-replication is master/slave?
It is, today. Multi-way is under development, but by its nature won't ever
have the same consistency guarantees as synchronous replication.
___
On 10/07/2013 13:58, Jeff Darcy wrote:
2d. it needs a fast validation scanner which verifies that data is where
it should be and is identical everywhere (md5sum).
How fast is fast? What would be an acceptable time for such a scan on
a volume containing (let's say) ten million files?
What
My minimal donation:
On 07/10/2013 04:01 AM, Allan Latham wrote:
There seems to be a problem with the way gluster is going.
For me it would be an ideal solution if it actually worked.
Actually working is always the ideal. Actually working for all possible
use cases... may be a little more
I had the same problem with striped-replicated.
https://bugzilla.redhat.com/show_bug.cgi?id=861423
Best,
Michael
-Original Message-
From: Benedikt Fraunhofer
[mailto:benedikt.fraunhofer.l.gluster.fxy-3zz-...@traced.net]
Sent: Monday, July 08, 2013 3:43 AM
To:
Well, first of all,thank for the responses. The volume WAS failing over the
tcp just as predicted,though WHY is unclear as the fabric is know working
(has about 28K compute cores on it all doing heavy MPI testing on it), and
the OFED/verbs stack is consistent across all client/storage systems
On 07/10/2013 02:36 PM, Joe Julian wrote:
1) http://www.solarflare.com makes sub microsecond latency adapters that
can utilize a userspace driver pinned to the cpu doing the request
eliminating a context switch
We've used open-onload in the past on Solarflare hardware. And with
GlusterFS.
On 10/07/2013, at 7:49 PM, Matthew Nicholson wrote:
Well, first of all,thank for the responses. The volume WAS failing over the
tcp just as predicted,though WHY is unclear as the fabric is know working
(has about 28K compute cores on it all doing heavy MPI testing on it), and
the OFED/verbs
justin,
yeah, this fabirc is all bran new mellanox, and all nodes are running their
v2 stack.
of for a beg report, sure thing. I was thinking i would tack on a comment
here:
https://bugzilla.redhat.com/show_bug.cgi?id=982757
since thats about the silent failure.
--
Matthew Nicholson
Research
On 10/07/2013, at 8:05 PM, Matthew Nicholson wrote:
justin,
yeah, this fabirc is all bran new mellanox, and all nodes are running their
v2 stack.
Cool. The only thing that worries me about the v2 stack, is they've dropped SDP
support. SDP seemed to have limited scope (speeding up IPoIB),
On 07/10/2013 11:51 AM, Joe Landman wrote:
On 07/10/2013 02:36 PM, Joe Julian wrote:
1) http://www.solarflare.com makes sub microsecond latency adapters that
can utilize a userspace driver pinned to the cpu doing the request
eliminating a context switch
We've used open-onload in the past on
On 07/10/2013 03:18 PM, Joe Julian wrote:
The small file complaint is all about latency though. There's very
little disk overhead (all inode lookups) to doing a self-heal check. ls
-l on a 50k file directory and nearly all the delay is from network RTT
for self-heal checks (check that with
How many nodes make up that volume that you were using for testing?
Over 100 nodes running at QDR/IPoIB using 100 threads we we ran around 60GB/s
read and somewhere in the 40GB/s for writes (iirc).
On Jul 10, 2013, at 1:49 PM, Matthew Nicholson matthew_nichol...@harvard.edu
wrote:
Well,
Ryan,
10(storage) nodes, I did some test w 1 brick per node, and another round w/
4 per node. Each is FDR connected, but all on the same switch.
I'd love to hear about your setup, gluster version, OFED stack etc
--
Matthew Nicholson
Research Computing Specialist
Harvard FAS Research
It looks like the brick processes on fw2 machine are not running and
hence when fw1 is down, the
entire replication process is stalled. can u do a ps and get the status
of all the gluster processes and
ensure that the brick process is up on fw2.
I was away from this most
Hello,
I have a gluster cluster which keeps complaining about
ops.c:842:glusterd_op_stage_start_volume] 0-: Unable to resolve brick
gluster1:/export/brick1/sdb1
here is the full output : https://gist.github.com/msacks/5970713
Not sure how this happened or how to fix it.
All my peers are
And here is ps ax | grep gluster from both nodes when fw1 is offline. Note I
have it mounted right now with the 'backupvolfile-server=secondary server
mount option. The ps ax | grep gluster output looks the same now as it did
when both nodes were online.
From fw1:
[root@chicago-fw1 gregs]#
Here is the startup sequence: https://gist.github.com/msacks/5971418
On Wed, Jul 10, 2013 at 3:02 PM, Matthew Sacks msacksda...@gmail.comwrote:
Hello,
I have a gluster cluster which keeps complaining about
ops.c:842:glusterd_op_stage_start_volume] 0-: Unable to resolve brick
Check out https://bugzilla.redhat.com/show_bug.cgi?id=911290
It seems similar so hopefully it'll help...
Todd
On Wed, Jul 10, 2013 at 05:18:46PM -0700, Matthew Sacks wrote:
Here is the startup sequence: https://gist.github.com/msacks/5971418
On Wed, Jul 10, 2013 at 3:02 PM, Matthew Sacks
That error means (and if it means this, then why doesn't it just say
this???) that the hostname provided could not be converted to its uuid.
That probably means that the hostname assigned to the brick is not in
the peer list.
The hostname of the brick has to be a case insensitive match for
By the way, this less than useful error message has been reworked for 3.4.
On 07/10/2013 05:54 PM, Joe Julian wrote:
That error means (and if it means this, then why doesn't it just say
this???) that the hostname provided could not be converted to its
uuid. That probably means that the
Hi all - especially Jeff, Marcus and HL
I couldn't resist a quick test after compiling 3.4 beta. Looks
good. Same (very quick) times to do md5sums on both servers so it must
be doing local reads. So gluster is still in the running.
I repeat - you guys are doing a great job. Software like gluster
39 matches
Mail list logo