My few cents.. I am not sure if we can distinguish between spurious/non-spurious warnings and I don't think we can time it well. The delay is applicable only in certain cases. If the user knows that there will be a start up delay, then the user can ignore those errors or modify their scripts to start the server after a delay. Does this have to implemented in the server? I sounds me that this is something that user scripts should handle.
On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <[email protected]> wrote: > Sampath, Do you think something along the lines of what Ted describes would > work for you? > > -Flavio > > On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote: > > The thought is that a server would not complain about connection refused or > inability to form a quorum during the first (say) twenty seconds of > operation. > > The thesis is that warnings from these causes during that time are > spurious. > > As I mentioned, I don't see this as urgent or even necessarily a good idea. > I completely reboot a ZK cluster once every year or three. When I am doing > a rolling upgrade, I *want* to see alerts when I bounce a machine. If I > don't want to see those alerts, my monitoring system allows me to put a > machine into maintenance mode for a short period of time to temporarily > suppress the warnings. > > All I was doing was translating and elaborating the original poster's > suggestion, not so much endorsing it. > > On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <[email protected]>wrote: > >> Hi Ted, I don't see how one can automate the distinction between a machine >> that is down because it crashed and a machine that is down because it hasn't >> started yet. Assuming that we are logging the machine unavailability as we >> are doing currently, one can always look at the timestamp of the warning and >> remember that this is the time the machines were bootstrapping. >> Consequently, I don't really see the point of reducing the number of >> warnings, unless the warnings are really polluting the logs. I typically >> don't see so many that prevents me from reading the rest, but you may have a >> different perception. Also, recall that we back off, so the warnings become >> less frequent over time. >> >> I'm open to ideas, though. If you see anything wrong in my rationale or if >> you have an idea of how to do it differently, then I'd be happy to hear. >> However, if the idea is simply to add a parameter that configures the time >> for leader election to start, then I'm currently not in favor. >> >> -Flavio >> >> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote: >> >> Flavio, >> >> What you say is correct, but the original poster does have a point that >> many >> of these warnings are to be expected and there is a heuristic that might >> assist in distinguishing some of these cases so that false alarms in the >> logs could be decreased. >> >> That doesn't seem like a big deal to me, but different people have >> different >> itches. In my experience, restarting a ZK cluster from zero almost never >> happens. >> >> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <[email protected]> >> wrote: >> >> >> >> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <[email protected] >> >wrote: >> >> >> >> Hhmmm, I think this is a bit different isn't it? Here we know that the >> >> first >> >> server to come will be failing to connect to the other as they are not yet >> >> up. Anyway our real issue is the warning. >> >> >> >> We know that. >> >> >> But how does the server know that it is the first server? That is the >> >> whole point of the leader election. You might just have a server >> rejoining >> >> a cluster. Or you might have a cluster that has been turned off. Or a >> >> cluster with 2 out of 5 machines off and we tried to touch the other down >> >> machine before the others. >> >> >> >> >> Would you like to suggest a patch? >> >> >> >> Of course I do.. will prepare a patch and attach. >> >> >> >> Great! >> >> >> >> >> *flavio* >> *junqueira* >> >> research scientist >> >> [email protected] >> direct +34 93-183-8828 >> >> avinguda diagonal 177, 8th floor, barcelona, 08018, es >> phone (408) 349 3300 fax (408) 349 3301 >> >> >> > > *flavio* > *junqueira* > > research scientist > > [email protected] > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 > > >
