Hi Bart, I haven't had a chance to do a whole lot of debugging of this problem, although I did run your test once. I was using XFS as the underlying filesystem, and after running successfully for a while, the XFS mount just kind of blew up on me, and started returning EIO errors for everything. I will try the test again with ext2 (is that what you're using?), but in the meantime, I wanted to address the actual problem as you described, which is that you can't remove those bad file entries easily. I've attached a patch which I think will fix the issue, allowing you to remove those bad entries with either rm or pvfs2-rm. I haven't been able to test this fully, but its only a one-liner, so there shouldn't be any unintended side-affects. Do you want to give it a try?
-sam
fix-remove-enoent.patch
Description: Binary data
On Jun 18, 2010, at 3:21 PM, Bart Taylor wrote: > Yes. I reran the test again so that I could grab the actual messages, and > this time it was a bit more aggressive. Instead of leaving a file in a bad > state, it left my whole directory structure under the file system root in > that state. I attached a chunk of log messages from the client and from the > server that was timing out. The other servers did not log anything. > > I am currently getting this back from pvfs2-fsck: > > server 1, exceeding number of handles it declared (42923), currently (43000) > pvfs28-fsck: ../pvfs2_src/src/apps/admin/pvfs2-fsck.c:1325: > handlelist_add_handles: Assertion `0' failed. > Aborted > > > Bart. > > > > On Fri, Jun 18, 2010 at 8:15 AM, Sam Lang <sl...@mcs.anl.gov> wrote: > > Hi Bart, > > When you run the script, do you see any timeout error messages in the client > log? > > -sam > > On Jun 18, 2010, at 9:03 AM, Bart Taylor wrote: > > > Hey Phil, > > > > Yes, it is running 2.8.2. My setup was using 3 servers with 2.6.18-194.el5 > > kernels and High Availability. I have not had a chance yet to try it on > > another file system, so I do not know if it is specific to that setup. It > > has been triggered from more than one client, but the only know I know for > > certain was running a 2.6.9-89.ELsmp kernel. > > > > Bart. > > > > > > On Fri, Jun 18, 2010 at 7:39 AM, Phil Carns <ca...@mcs.anl.gov> wrote: > > Hi Bart, > > > > Is this on 2.8.2? Do you happen to know how many servers are needed to > > trigger the problem? > > > > thanks, > > -Phil > > > > > > On 06/17/2010 04:08 PM, Bart Taylor wrote: > >> > >> Hey guys, > >> > >> We have had some problems in the past on 2.6 with file creations leaving > >> bad > >> files that we cannot delete. Most utilities like ls and rm return "No such > >> file > >> or directory", and pvfs utilities like viewdist, pvfs2-ls, and pvfs2-rm > >> return > >> various errors. We have resorted to looking up the parent handle, the > >> fsid, and > >> filename and using pvfs2-remove-object to delete the entry. But we weren't > >> ever > >> able to intentionally recreate the problem. > >> > >> Recently while testing 2.8, I have been able to reliably trigger a similar > >> scenario where a file creation fails and leaves a garbage entry that > >> cannot be > >> deleted in any of the normal ways requiring the pvfs2-remove-object > >> approach to > >> clean up. The file and various outputs for this case: > >> > >> [r...@client dir]# ls -l 2010.06.10.28050 > >> total 0 > >> ?--------- ? ? ? ? ? File17027 > >> > >> [r...@client dir]# rm 2010.06.10.28050/File17027 > >> rm: cannot lstat `2010.06.10.28050/File17027': No such file or directory > >> > >> [r...@client dir]# rm -rf 2010.06.10.28050 > >> rm: cannot remove directory `2010.06.10.28050': Directory not empty > >> > >> [r...@client dir]# pvfs2-rm 2010.06.10.28050/File17027 > >> Error: An error occurred while removing 2010.06.10.28050/File17027 > >> PVFS_sys_remove: No such file or directory (error class: 0) > >> > >> [r...@client dir]# pvfs2-stat 2010.06.10.28050/File17027 > >> PVFS_sys_lookup: No such file or directory (error class: 0) > >> Error stating [2010.06.10.28050/File17027] > >> > >> [r...@client dir]# pvfs2-viewdist -f 2010.06.10.28050/File17027 > >> PVFS_sys_lookup: No such file or directory (error class: 0) > >> Could not open 2010.06.10.28050/File17027 > >> > >> [r...@client dir]# ls -l 2010.06.10.28050 > >> total 0 > >> ?--------- ? ? ? ? ? File17027 > >> > >> > >> I have included a test script that will spawn off a number of processes, > >> open a > >> bunch of files, write to each of them, then close them. You can tweak the > >> options as you want but using 5 processes and 50,000 files will usually > >> create > >> at least one of these files. Here is an example command: > >> > >> $> ulimit -n 1000000 && ./open-file-limit --num-files=50000 --sleep-time=1 > >> --num-processes=5 --directory=/mnt/pvfs2/ --file-size=1 > >> > >> You may have to do a long listing on any left-over directories to find the > >> file(s). > >> > >> I will give any help I can to help recreate the bad file or find the cause. > >> Until then, is there a better (simpler) way to remove these entries, maybe > >> some sort of utility that doesn't require doing manual handle lookups > >> before > >> getting the file removed? It would ease some support pain if it were > >> simpler to > >> fix. > >> > >> Thanks for your help, > >> Bart. > >> > >> _______________________________________________ > >> Pvfs2-developers mailing list > >> > >> Pvfs2-developers@beowulf-underground.org > >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > >> > >> > >> > > > > > > _______________________________________________ > > Pvfs2-developers mailing list > > Pvfs2-developers@beowulf-underground.org > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > > > > _______________________________________________ > > Pvfs2-developers mailing list > > Pvfs2-developers@beowulf-underground.org > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > <server2-log.TXT><client-log.TXT>
_______________________________________________ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers