Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-13 Thread Wang Yibin
This sounds like bug 20183 which was caused by race in object creation. Unfortunately the fix is not in Lustre release 1.6.6. You need to upgrade your filesystem to avoid hitting this LBUG again. To fix dangling inodes, if you are very sure that these objects are not useful, use '-d' option to d

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-13 Thread Christopher Walker
Thanks again for the advice, and for putting this patch together so quickly. I put your patch into 1.41.6.sun1 and it works perfectly, at least in read-only mode. Unfortunately, when I run lfsck -l on this array, it gets to ost_idx 30, runs through part of the check, but then hangs while checking

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-12 Thread Wang Yibin
For the moment, without investigation, I am not sure about this - There may or may not be compatibility issue. Please checkout the version of the e2fsprogs which is identical with that on your system and patch against the lfsck.c accordingly. Then you can compile against 1.6.6. 在 2010-11-13,上午1

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-12 Thread Christopher Walker
Thanks again for this patch. I just have one quick question about this -- 1.41.12.2.ora1 seems to require lustre_user.h from 1.8.x -- is OK to use a version of lfsck compiled against 1.8.x on a 1.6.6 filesystem, and with {mds,ost}db that were created with 1.41.6? Best, Chris On 11/12/10 3:17 AM,

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-12 Thread Christopher Walker
Thanks *very* much -- I'll give this a shot later today and let you know how it goes. Best, Chris On 11/12/10 3:17 AM, Wang Yibin wrote: > This is a bug in llapi_lov_get_uuids() which assigns UUID to the wrong OST > index when there are sparse OST(s). > Please file a bug for this. > > Before thi

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-12 Thread Christopher Walker
Thanks Andreas. The orphan data is scattered throughout the array, although it's primarily on one OST (30) which seems to have been hit particularly hard by this outage: [r...@iliadaccess04 lfsck2]# grep ERROR lfsck2.out lfsck: ost_idx 5: pass2 ERROR: 3817 dangling inodes found (654297 files t

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-12 Thread Wang Yibin
This is a bug in llapi_lov_get_uuids() which assigns UUID to the wrong OST index when there are sparse OST(s). Please file a bug for this. Before this bug can be fixed, you can apply the following patch to e2fsprogs(version 1.41.12.2.ora1) lfsck.c as a workaround (not verified though). --- e2fs

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-12 Thread Andreas Dilger
On 2010-11-11, at 19:53, Christopher Walker wrote: > Thanks very much for your reply. I've tried remaking the mdsdb and all > of the ostdb's, but I still get the same error -- it checks the first 34 > osts without a problem, but can't find the ostdb file for the 35th > (which has ost_idx 42): > >

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-11 Thread Christopher Walker
Thanks very much for your reply. I've tried remaking the mdsdb and all of the ostdb's, but I still get the same error -- it checks the first 34 osts without a problem, but can't find the ostdb file for the 35th (which has ost_idx 42): ... lfsck: ost_idx 34: pass3 OK (676803 files total) lfsck: can

Re: [Lustre-discuss] non-consecutive OST ordering

2010-11-09 Thread Wang Yibin
The error message indicates that the UUID of OST #35 does not match between the live filesystem and the ostdb file. Is this ostdb obsolete? 在 2010-11-9,下午11:45, Christopher Walker 写道: > > > For reasons that I can't recall, our OSTs are not in consecutive order > -- we have 35 OSTs, which are

[Lustre-discuss] non-consecutive OST ordering

2010-11-09 Thread Christopher Walker
For reasons that I can't recall, our OSTs are not in consecutive order -- we have 35 OSTs, which are numbered consecutively from -0021 and then there's one last OST at 002a When I try to run lfsck on this array, it works fine for the first 34 OSTs, but it can't seem to find the last OST db