On Fri, 24 Jan 2014 11:41:35 -0500
Jeffrey Hutzelman wrote:
> The problem is the one-off clients that make _one RPC_ and then exit.
> They have no opportunity to remember what didn't work last time. It
> might help some for these sorts of clients to use multi, if they're
> doing read-only reques
On 1/24/2014 11:45 AM, Brandon Allbery wrote:
> On Fri, 2014-01-24 at 11:41 -0500, Jeffrey Hutzelman wrote:
>> The problem is the one-off clients that make _one RPC_ and then exit.
>> They have no opportunity to remember what didn't work last time. It
>
> Has it been considered to write a cache f
On Fri, 2014-01-24 at 11:41 -0500, Jeffrey Hutzelman wrote:
> The problem is the one-off clients that make _one RPC_ and then exit.
> They have no opportunity to remember what didn't work last time. It
Has it been considered to write a cache file somewhere (even a user
dotfile) that could be used
On Fri, 2014-01-24 at 08:01 +, Simon Wilkinson wrote:
> On 24 Jan 2014, at 07:48, Harald Barth wrote:
>
> > You are completely right if one must talk to that server. But I think
> > that AFS/RX sometimes hangs to loong on waiting for one server
> > instead of trying the next one. For exam
> I have long thought that we should be using multi for vldb lookups,
> specifically to avoid the problems with down database servers.
The situation is a little bit different for cache managers who can
remember which servers are down and command line tools which normally
discocver how the world
On Thu, 23 Jan 2014 21:55:15 +
p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
> > Otherwise, when your network becomes congested, the
> > retransmission of dropped packets will act as a runaway positive
> > feedback loop, making the congestion worse and saturating the
> > network.
>
> I am so
>>> For example in an ideal world putting more or less DB servers
>>> in the client 'CellServDB' should not matter, as long as one
>>> that belongs to the cell is up; again if the logic were for
>>> all types of client: "scan quickly the list of potential DB
>>> servers, find one that is up and bel
On 24 Jan 2014, at 07:48, Harald Barth wrote:
> You are completely right if one must talk to that server. But I think
> that AFS/RX sometimes hangs to loong on waiting for one server
> instead of trying the next one. For example for questions that could
> be answered by any VLDB. I'm thinking
> The problem is that you the client to scan "quickly" to find a server
> that is up, but because networks are not perfectly reliable and drop
> packets all the time, it cannot know that a server is not up until that
> server has failed to respond to multiple retransmissions of the request.
> Those
On Thu, 23 Jan 2014 15:39:03 +
p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
> > Oh also, I'm not sure why you're adding the new machines to
> > the CellServDB before the new server is up. You could bring up
> > e.g. dbserver 4, and only after you're sure it's up and
> > available, then add i
On Thu, 23 Jan 2014 14:33:58 -0500
Jeffrey Hutzelman wrote:
> The problem is that you the client to scan "quickly" to find a server
> that is up, but because networks are not perfectly reliable and drop
> packets all the time, it cannot know that a server is not up until that
> server has failed
On Thu, 2014-01-23 at 14:58 +, Peter Grandi wrote:
> My real issue was 'server/CellServeDB' because we could not
> prepare ahead of time all 3 new servers, but only one at a time.
>
> The issue is that with 'server/CellServDB' update there is
> potentially a DB daemon (PT, VL) restart (even i
On Thu, 2014-01-23 at 10:44 -0600, Andrew Deason wrote:
> > For example in an ideal world putting more or less DB servers in
> > the client 'CellServDB' should not matter, as long as one that
> > belongs to the cell is up; again if the logic were for all types
> > of client: "scan quickly the lis
On Thu, 23 Jan 2014 14:58:35 +
p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
> The issue is that with 'server/CellServDB' update there is
> potentially a DB daemon (PT, VL) restart (even if the rekeying
> instructions hint that when the mtime of 'server/CellServDB'
> changes the DB daemons re
[ ... ]
>> At some point during this slow incremental plan there were 4
>> entries in both 'CellServDB's and the new one had not been
>> started up yet, and would not be for a couple days.
> Oh also, I'm not sure why you're adding the new machines to
> the CellServDB before the new server is up.
> [ ... ] adding the new machines to the CellServDB before the
> new server is up. You could bring up e.g. dbserver 4, and only
> after you're sure it's up and available, then add it to the
> client CellServDB. Then remove dbserver #3 from the client
> CellServDB, and then turn off dbserver #3.
Fo
On Fri, 2014-01-17 at 14:21 -0600, Andrew Deason wrote:
> On Fri, 17 Jan 2014 18:50:13 +
> p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
>
> > Planned to do this incremental by adding a new DB server to the
> > 'CellServDB', then starting it up, then removing the an old DB
> > server, and so
On Fri, 2014-01-17 at 14:12 -0600, Andrew Deason wrote:
> time, so presumably if we contact a downed dbserver, the client will not
> try to contact that dbserver for quite some time.
To elaborate: the cache manager keeps track of every server, and
periodically sends a sort of "ping" to each ser
On Fri, 17 Jan 2014 18:50:13 +
p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
> Planned to do this incremental by adding a new DB server to the
> 'CellServDB', then starting it up, then removing the an old DB
> server, and so on until all 3 have been replaced in turn with
> new DB servers #4,
On Fri, 17 Jan 2014 18:50:13 +
p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
> What rules do the OpenAFS tools use to contact one of
> the DB servers?
Most of the time ("read" requests), we'll pick a random dbserver, and
use it. If contacting a dbserver fails for network reasons, we will
20 matches
Mail list logo