Hi Bart,

I haven't had a chance to do a whole lot of debugging of this problem, although 
I did run your test once.  I was using XFS as the underlying filesystem, and 
after running successfully for a while, the XFS mount just kind of blew up on 
me, and started returning EIO errors for everything. I will try the test again 
with ext2 (is that what you're using?), but in the meantime, I wanted to 
address the actual problem as you described, which is that you can't remove 
those bad file entries easily.  I've attached a patch which I think will fix 
the issue, allowing you to remove those bad entries with either rm or pvfs2-rm. 
 I haven't been able to test this fully, but its only a one-liner, so there 
shouldn't be any unintended side-affects.  Do you want to give it a try?

-sam

Attachment: fix-remove-enoent.patch
Description: Binary data


On Jun 18, 2010, at 3:21 PM, Bart Taylor wrote:

> Yes. I reran the test again so that I could grab the actual messages, and 
> this time it was a bit more aggressive. Instead of leaving a file in a bad 
> state, it left my whole directory structure under the file system root in 
> that state. I attached a chunk of log messages from the client and from the 
> server that was timing out. The other servers did not log anything.
> 
> I am currently getting this back from pvfs2-fsck:
> 
> server 1, exceeding number of handles it declared (42923), currently (43000)
> pvfs28-fsck: ../pvfs2_src/src/apps/admin/pvfs2-fsck.c:1325: 
> handlelist_add_handles: Assertion `0' failed.
> Aborted
> 
> 
> Bart.
> 
> 
> 
> On Fri, Jun 18, 2010 at 8:15 AM, Sam Lang <sl...@mcs.anl.gov> wrote:
> 
> Hi Bart,
> 
> When you run the script, do you see any timeout error messages in the client 
> log?
> 
> -sam
> 
> On Jun 18, 2010, at 9:03 AM, Bart Taylor wrote:
> 
> > Hey Phil,
> >
> > Yes, it is running 2.8.2.  My setup was using 3 servers with 2.6.18-194.el5 
> > kernels and High Availability. I have not had a chance yet to try it on 
> > another file system, so I do not know if it is specific to that setup. It 
> > has been triggered from more than one client, but the only know I know for 
> > certain was running a 2.6.9-89.ELsmp kernel.
> >
> > Bart.
> >
> >
> > On Fri, Jun 18, 2010 at 7:39 AM, Phil Carns <ca...@mcs.anl.gov> wrote:
> > Hi Bart,
> >
> > Is this on 2.8.2?  Do you happen to know how many servers are needed to 
> > trigger the problem?
> >
> > thanks,
> > -Phil
> >
> >
> > On 06/17/2010 04:08 PM, Bart Taylor wrote:
> >>
> >> Hey guys,
> >>
> >> We have had some problems in the past on 2.6 with file creations leaving 
> >> bad
> >> files that we cannot delete. Most utilities like ls and rm return "No such 
> >> file
> >> or directory", and pvfs utilities like viewdist, pvfs2-ls, and pvfs2-rm 
> >> return
> >> various errors. We have resorted to looking up the parent handle, the 
> >> fsid, and
> >> filename and using pvfs2-remove-object to delete the entry. But we weren't 
> >> ever
> >> able to intentionally recreate the problem.
> >>
> >> Recently while testing 2.8, I have been able to reliably trigger a similar
> >> scenario where a file creation fails and leaves a garbage entry that 
> >> cannot be
> >> deleted in any of the normal ways requiring the pvfs2-remove-object 
> >> approach to
> >> clean up. The file and various outputs for this case:
> >>
> >> [r...@client dir]# ls -l 2010.06.10.28050
> >> total 0
> >> ?---------  ? ? ? ?           ? File17027
> >>
> >> [r...@client dir]# rm 2010.06.10.28050/File17027
> >> rm: cannot lstat `2010.06.10.28050/File17027': No such file or directory
> >>
> >> [r...@client dir]# rm -rf 2010.06.10.28050
> >> rm: cannot remove directory `2010.06.10.28050': Directory not empty
> >>
> >> [r...@client dir]# pvfs2-rm 2010.06.10.28050/File17027
> >> Error: An error occurred while removing 2010.06.10.28050/File17027
> >> PVFS_sys_remove: No such file or directory (error class: 0)
> >>
> >> [r...@client dir]# pvfs2-stat 2010.06.10.28050/File17027
> >> PVFS_sys_lookup: No such file or directory (error class: 0)
> >> Error stating [2010.06.10.28050/File17027]
> >>
> >> [r...@client dir]# pvfs2-viewdist -f 2010.06.10.28050/File17027
> >> PVFS_sys_lookup: No such file or directory (error class: 0)
> >> Could not open 2010.06.10.28050/File17027
> >>
> >> [r...@client dir]# ls -l 2010.06.10.28050
> >> total 0
> >> ?---------  ? ? ? ?           ? File17027
> >>
> >>
> >> I have included a test script that will spawn off a number of processes, 
> >> open a
> >> bunch of files, write to each of them, then close them. You can tweak the
> >> options as you want but using 5 processes and 50,000 files will usually 
> >> create
> >> at least one of these files. Here is an example command:
> >>
> >> $> ulimit -n 1000000 && ./open-file-limit --num-files=50000 --sleep-time=1 
> >> --num-processes=5 --directory=/mnt/pvfs2/ --file-size=1
> >>
> >> You may have to do a long listing on any left-over directories to find the 
> >> file(s).
> >>
> >> I will give any help I can to help recreate the bad file or find the cause.
> >> Until then, is there a better (simpler) way to remove these entries, maybe
> >> some sort of utility that doesn't require doing manual handle lookups 
> >> before
> >> getting the file removed? It would ease some support pain if it were 
> >> simpler to
> >> fix.
> >>
> >> Thanks for your help,
> >> Bart.
> >>
> >> _______________________________________________
> >> Pvfs2-developers mailing list
> >>
> >> Pvfs2-developers@beowulf-underground.org
> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >>
> >>
> >>
> >
> >
> > _______________________________________________
> > Pvfs2-developers mailing list
> > Pvfs2-developers@beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >
> >
> > _______________________________________________
> > Pvfs2-developers mailing list
> > Pvfs2-developers@beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> 
> 
> <server2-log.TXT><client-log.TXT>

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to