At Tue, 21 Aug 2012 04:34:05 +0000, Dietmar Maurer wrote: > > > On 08/21/2012 12:07 AM, Christoph Hellwig wrote: > > > Another thing that sprang into mind is that instead of the formal > > > recovery enable/disable we should simply always delay recovery, that > > > is only do recovery after every N seconds if changes happened. > > > Especially in the cases of whole racks going up/down or upgrades that > > > dramatically reduces the number of epochs required, and thus reduces > > > the recovery overhead. > > > > > > I didn't actually have time to look into the implementation > > > implications of this yet, it's just high level thoughs. > > > > I think negatively to delay recovery all the time. It is useful to delay > > recovery > > in some time window for maintenance or operational purposes, so I think > > the idea only to delay recovery manually at some controlled window is > > useful, but if we extend this to all the running time, it will bring > > cluster to a > > less safe state (if not > > dangerous) at any point. (we only upgrade cluster/maintain individual node > > only at some time, not all the time, no?) > > I still think that automatic recovery without delay is the wrong approach. At > least for > small clusters you simply want to avoid unnecessary traffic. Such recovery > can produce > massive traffic on the network (several TB of data), and can make the whole > system unusable > because of that. I want to control when recovery starts.
Disabling automatic recovery by default doesn't work for you? You can control the time to start recovery with "collie cluster recover enable". Thanks, Kazutaka -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog