On Mon, Mar 19, 2012 at 11:26 AM, huxinwei <[email protected]> wrote:
> Hi,**** > > ** ** > > Similar problems happened to me a while ago, even without cache.**** > > However, I think the problem is “What’s the expected behavior of > formatting a running cluster?”**** > > ** ** > > Has this been discussed before ? I’m wondering if you already have an > answer for this. > Yes, the root cause is formatting a running cluster. >From my test, the formatting lead to object and vdi bitmap clear. It seems right to me. I have not tested it carefully. There are some other errors in your test? I am not sure of the behavior of this kind of formatting. I think sheepdog should support this. Thanks, Haiting > **** > > ** ** > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *HaiTing Yao > *Sent:* Monday, March 19, 2012 10:44 AM > *To:* Liu Yuan > *Cc:* HaiTing Yao; [email protected] > *Subject:* Re: [Sheepdog] [PATCH] sheep: modify cached_epoch**** > > ** ** > > ** ** > > On Fri, Mar 16, 2012 at 6:35 PM, Liu Yuan <[email protected]> wrote:*** > * > > On 03/16/2012 04:43 PM, [email protected] wrote: > > > From: HaiTing Yao <[email protected]> > > > > cached_epoch is a __thread variable. If it greater than 1, format the > > cluster again will lead to permanent I/O error. > > > > Signed-off-by: HaiTing Yao <[email protected]> > > --- > > sheep/sdnet.c | 6 +++++- > > 1 files changed, 5 insertions(+), 1 deletions(-) > > > > diff --git a/sheep/sdnet.c b/sheep/sdnet.c > > index 5db9f29..d693858 100644 > > --- a/sheep/sdnet.c > > +++ b/sheep/sdnet.c > > @@ -832,7 +832,11 @@ int get_sheep_fd(uint8_t *addr, uint16_t port, int > node_idx, uint32_t epoch) > > if (before(epoch, cached_epoch)) { > > eprintf("requested epoch is smaller than the previous one: > %d < %d\n", > > epoch, cached_epoch); > > - return -1; > > + /* cluster format again */ > > + if (sys->epoch == 1) > > + cached_epoch = 0; > > + else > > + return -1; > > } > > if (after(epoch, cached_epoch)) { > > for (i = 0; i < SD_MAX_NODES; i++) { > > **** > > Any script that can reproduce this issue?**** > > > Thanks, > Yuan**** > > **** > > Please try this script, thanks**** > > **** > > The error log like this**** > > **** > > Mar 19 10:28:14 forward_write_obj_req(304) 70912800000000 > Mar 19 10:28:14 get_sheep_fd(834) requested epoch is smaller than the > previous one: 1 < 2 > Mar 19 10:28:14 forward_write_obj_req(337) failed to connect to > 127.0.0.1:7002 > Mar 19 10:28:14 do_io_request(785) failed: 1, 70912800000000 , 1, 129 > Mar 19 10:28:14 client_handler(557) closed connection 11**** > > test-cached.sh**** > > **** > > set -x**** > > sudo killall sheep > sudo rm -rf ~/s1 ~/s2 ~/s3 ~/s4 **** > > echo "test cached epoch" > ~/tmp-cached > sudo sheep -d ~/s1 -z 1 > sudo sheep -d ~/s2 -z 2 -p 7002 > sudo sheep -d ~/s3 -z 3 -p 7003 > sudo sheep -d ~/s4 -z 4 -p 7004 **** > > sleep 60**** > > collie cluster format**** > > collie vdi create v1 64M**** > > sleep 30**** > > collie vdi write v1 0 1024 < ~/tmp-cached **** > > ps -ef | grep "\-z 4" | awk '{print $2}' | xargs sudo kill**** > > sleep 60**** > > collie vdi write v1 0 1024 < ~/tmp-cached **** > > sleep 6**** > > collie cluster format**** > > collie vdi create v1 64M**** > > sleep 60**** > > collie vdi write v1 0 1024 < ~/tmp-cached **** > > Best Regards**** >
-- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
