[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Andrew Deason
On Fri, 24 Jan 2014 11:41:35 -0500 Jeffrey Hutzelman wrote: > The problem is the one-off clients that make _one RPC_ and then exit. > They have no opportunity to remember what didn't work last time. It > might help some for these sorts of clients to use multi, if they're > doing read-only reques

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Jeffrey Altman
On 1/24/2014 11:45 AM, Brandon Allbery wrote: > On Fri, 2014-01-24 at 11:41 -0500, Jeffrey Hutzelman wrote: >> The problem is the one-off clients that make _one RPC_ and then exit. >> They have no opportunity to remember what didn't work last time. It > > Has it been considered to write a cache f

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Brandon Allbery
On Fri, 2014-01-24 at 11:41 -0500, Jeffrey Hutzelman wrote: > The problem is the one-off clients that make _one RPC_ and then exit. > They have no opportunity to remember what didn't work last time. It Has it been considered to write a cache file somewhere (even a user dotfile) that could be used

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Jeffrey Hutzelman
On Fri, 2014-01-24 at 08:01 +, Simon Wilkinson wrote: > On 24 Jan 2014, at 07:48, Harald Barth wrote: > > > You are completely right if one must talk to that server. But I think > > that AFS/RX sometimes hangs to loong on waiting for one server > > instead of trying the next one. For exam

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Harald Barth
> I have long thought that we should be using multi for vldb lookups, > specifically to avoid the problems with down database servers. The situation is a little bit different for cache managers who can remember which servers are down and command line tools which normally discocver how the world

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Andrew Deason
On Thu, 23 Jan 2014 21:55:15 + p...@afs.list.sabi.co.uk (Peter Grandi) wrote: > > Otherwise, when your network becomes congested, the > > retransmission of dropped packets will act as a runaway positive > > feedback loop, making the congestion worse and saturating the > > network. > > I am so

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Peter Grandi
>>> For example in an ideal world putting more or less DB servers >>> in the client 'CellServDB' should not matter, as long as one >>> that belongs to the cell is up; again if the logic were for >>> all types of client: "scan quickly the list of potential DB >>> servers, find one that is up and bel

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Simon Wilkinson
On 24 Jan 2014, at 07:48, Harald Barth wrote: > You are completely right if one must talk to that server. But I think > that AFS/RX sometimes hangs to loong on waiting for one server > instead of trying the next one. For example for questions that could > be answered by any VLDB. I'm thinking

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-24 Thread Harald Barth
> The problem is that you the client to scan "quickly" to find a server > that is up, but because networks are not perfectly reliable and drop > packets all the time, it cannot know that a server is not up until that > server has failed to respond to multiple retransmissions of the request. > Those

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Andrew Deason
On Thu, 23 Jan 2014 15:39:03 + p...@afs.list.sabi.co.uk (Peter Grandi) wrote: > > Oh also, I'm not sure why you're adding the new machines to > > the CellServDB before the new server is up. You could bring up > > e.g. dbserver 4, and only after you're sure it's up and > > available, then add i

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Andrew Deason
On Thu, 23 Jan 2014 14:33:58 -0500 Jeffrey Hutzelman wrote: > The problem is that you the client to scan "quickly" to find a server > that is up, but because networks are not perfectly reliable and drop > packets all the time, it cannot know that a server is not up until that > server has failed

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Jeffrey Hutzelman
On Thu, 2014-01-23 at 14:58 +, Peter Grandi wrote: > My real issue was 'server/CellServeDB' because we could not > prepare ahead of time all 3 new servers, but only one at a time. > > The issue is that with 'server/CellServDB' update there is > potentially a DB daemon (PT, VL) restart (even i

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Jeffrey Hutzelman
On Thu, 2014-01-23 at 10:44 -0600, Andrew Deason wrote: > > For example in an ideal world putting more or less DB servers in > > the client 'CellServDB' should not matter, as long as one that > > belongs to the cell is up; again if the logic were for all types > > of client: "scan quickly the lis

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Andrew Deason
On Thu, 23 Jan 2014 14:58:35 + p...@afs.list.sabi.co.uk (Peter Grandi) wrote: > The issue is that with 'server/CellServDB' update there is > potentially a DB daemon (PT, VL) restart (even if the rekeying > instructions hint that when the mtime of 'server/CellServDB' > changes the DB daemons re

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Peter Grandi
[ ... ] >> At some point during this slow incremental plan there were 4 >> entries in both 'CellServDB's and the new one had not been >> started up yet, and would not be for a couple days. > Oh also, I'm not sure why you're adding the new machines to > the CellServDB before the new server is up.

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-23 Thread Peter Grandi
> [ ... ] adding the new machines to the CellServDB before the > new server is up. You could bring up e.g. dbserver 4, and only > after you're sure it's up and available, then add it to the > client CellServDB. Then remove dbserver #3 from the client > CellServDB, and then turn off dbserver #3. Fo

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-17 Thread Jeffrey Hutzelman
On Fri, 2014-01-17 at 14:21 -0600, Andrew Deason wrote: > On Fri, 17 Jan 2014 18:50:13 + > p...@afs.list.sabi.co.uk (Peter Grandi) wrote: > > > Planned to do this incremental by adding a new DB server to the > > 'CellServDB', then starting it up, then removing the an old DB > > server, and so

Re: [OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-17 Thread Jeffrey Hutzelman
On Fri, 2014-01-17 at 14:12 -0600, Andrew Deason wrote: > time, so presumably if we contact a downed dbserver, the client will not > try to contact that dbserver for quite some time. To elaborate: the cache manager keeps track of every server, and periodically sends a sort of "ping" to each ser

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-17 Thread Andrew Deason
On Fri, 17 Jan 2014 18:50:13 + p...@afs.list.sabi.co.uk (Peter Grandi) wrote: > Planned to do this incremental by adding a new DB server to the > 'CellServDB', then starting it up, then removing the an old DB > server, and so on until all 3 have been replaced in turn with > new DB servers #4,

[OpenAFS] Re: DB servers "quorum" and OpenAFS tools

2014-01-17 Thread Andrew Deason
On Fri, 17 Jan 2014 18:50:13 + p...@afs.list.sabi.co.uk (Peter Grandi) wrote: > What rules do the OpenAFS tools use to contact one of > the DB servers? Most of the time ("read" requests), we'll pick a random dbserver, and use it. If contacting a dbserver fails for network reasons, we will