Jean-Marc: Can you confirm that the Jira Jesse logged reflects your case ? Thanks
On Dec 30, 2012, at 4:13 PM, Jesse Yates <jesse.k.ya...@gmail.com> wrote: > Hey, > > So the point of all the delete code in the cleaner is to try and delete > each of the files in the directory and then delete the directory, assuming > its empty- it shouldn't leak the IOException if it the directory is found > to be empty and then gets a file added. > > This is really odd though, as failures should return false, not throw an > exception (boo HDFS javadocs). Looking at the 0.94 and 0.96 code, it its > just logged, which it annoying, but doesn't mean broken code. > > Otherwise, Jean-Marc's analysis looks right. Should be a simple fix. I > filed HBASE-7465 and should have a patch up shortly. > > As an aside, this method is actually tested (in a somewhat roundabout way) > in TestCleanerChore#testCleanerDoesNotDeleteDirectoryWithLateAddedFiles > with a spy object that ensures we get this non-error case. > > -Jesse > ------------------- > Jesse Yates > @jesse_yates > jyates.github.com > > > On Sun, Dec 30, 2012 at 11:50 AM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> The Javadoc is saying: >> >> "@return <tt>true</tt> if the directory was deleted, <tt>false</tt> >> otherwise" >> >> So I think the line "return canDeleteThis ? fs.delete(toCheck, false) >> : false;" is still correct. It's retuning false if the directory has >> not been deleted. >> >> There is no exception here. If the TTL for a file had not expired, the >> file can't be deleted and false is returned. I think it's correct >> behaviour. >> >> The idea of not passing "true" for the recursivity is explained on the >> comments: >> // if all the children have been deleted, then we should try to >> delete this directory. However, >> // don't do so recursively so we don't delete files that have been >> added since we checked. >> And I think it's good. So the issue is really when the directory is >> empty and listStatus is sending back null. Then if (children == null) >> return true; is simply returning true without deleting the current >> directory. >> >> This should be changed by something like >> if (children == null) return fs.delete(toCheck, false); >> Which will try to delete the current directory, return true or false >> if possible or not, and throw an expection if there is any issue with >> the FS... >> >> I have done some modifications. I'm compiling and will deploy the >> updated version on my local cluster soon. I will keep you posted on >> the result. >> >> JM >> >> 2012/12/30, Jean-Marc Spaggiari <jean-m...@spaggiari.org>: >>> Thanks for the confirmation. >>> >>> Also, seems that there is no test class related to >>> checkAndDeleteDirectory. It might be good to add that too. >>> >>> I have extracted 0.94.3 0.94.4RC0 and the trunk and they are all >>> identical for this methode. >>> >>> I will try to do some modifications and see the results... >>> >>> So far there is 2 options. One is to change the "return null" to >>> handle the current empty directory, and another one is to call >>> fs.delete() directly from checkAndDeleteDirectory instead of the >>> existing code. >>> >>> Will wait for Jesse's feedback. >>> >>> JM >>> >>> 2012/12/30, Ted Yu <yuzhih...@gmail.com>: >>>> Thanks for the digging. This concurs with my suspicion in the beginning. >>>> >>>> I am copying Jesse who wrote the code. He should have more insight on >>>> this. >>>> >>>> After his confirmation, you can log a JIRA. >>>> >>>> Cheers >>>> >>>> On Sun, Dec 30, 2012 at 10:59 AM, Jean-Marc Spaggiari < >>>> jean-m...@spaggiari.org> wrote: >>>> >>>>> So. Looking deeper I found few things. >>>>> >>>>> First, why checkAndDeleteDirectory is not "simply" calling >>>>> FSUtils.delete (fs, toCheck, true)? I guess it's doing the same thing? >>>>> >>>>> Also, FSUtils.listStatus(fs, toCheck, null); will return null if there >>>>> is no status. Not just an empty array. And it's returning null, we >>>>> will exit without calling the delete methode. >>>>> >>>>> I tried to manually create a file on one of those directories. The >>>>> exception disapears for 300 seconds because of the TTL for the newly >>>>> created file. After 300 seconds, the file I pushed AND the directory >>>>> got removed. So the issue is really with empty directories. >>>>> >>>>> I will take a look at what is in the trunk and in 0.94.4 to see if >>>>> it's the same issue. But I think we can simple change all this code by >>>>> a call to FSUtils.delete. >>>>> >>>>> I can open a JIRA and submit a patch for that. Just let me know. >>>>> >>>>> JM >>>>> >>>>> 2012/12/30, Jean-Marc Spaggiari <jean-m...@spaggiari.org>: >>>>>> Regargind the logcleaner settings, I have not changed anything. It's >>>>>> what came with the initial install. So I don't have anything setup >> for >>>>>> this plugin in my configuration files. >>>>>> >>>>>> For the files on the FS, here is what I have: >>>>>> hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls >>>>>> /hbase/.archive/entry_duplicate >>>>>> Found 30 items >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/00c185bc44b6dcf85a90b83bdda4ec2e >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/0ddf0d1802c6afd97d032fd09ea9e37d >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/18cf7c5c946ddf33e49b227feedfb688 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/2353f10e79dacc5cf201be6a1eb63607 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:38 >>>>>> /hbase/.archive/entry_duplicate/243f4007cf05415062010a5650598bff >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:38 >>>>>> /hbase/.archive/entry_duplicate/287682333698e36cea1670f5479fbf18 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/3742da9bd798342e638e1ce341f27537 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:38 >>>>>> /hbase/.archive/entry_duplicate/435c9c08bc08ed7248a013b6ffaa163b >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/45346b4b4248d77d45e031ea71a1fb63 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/4afe48fe6d8defe569f8632dd2514b07 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/68a4e364fe791a0d1f47febbb41e8112 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/7673d718962535c7b54cef51830f22a5 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:38 >>>>>> /hbase/.archive/entry_duplicate/7df6845ae9d052f4eae4a01e39313d61 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/8c5a263167d1b09f645af8efb4545554 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/8c98d9c635ba30d467d127a2ec1c69f8 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/8dfa96393e18ecca826fd9200e6bf68b >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/8e8f532e91a7197cd53b7626130be698 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/8eca1a325fe442a8546e43ac2f00cfef >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/9ad4c0551b90ea7717d7e3aaec76dc26 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/a135ccbc6f61ce544dbd537dc12489e9 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/a3d0332a6d51a8b15b99d1caca3f355a >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/bd2b8c942af27e541e20e430d506d2c0 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/c10c3a66948bde75fc41349108d86cf9 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:38 >>>>>> /hbase/.archive/entry_duplicate/cbf2f178691bfca8a7e9825115629b8e >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/d14a2546eaceede73b282e444ad1bb40 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:38 >>>>>> /hbase/.archive/entry_duplicate/d570a21a39e04ba2ec896bbe7166423c >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/e943bda56acd6beb35bdd56f0560f87f >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/ef5692ba83aba48d9e7a6b9c2cd0661e >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/fd85dd319c289959a790faed32ef1530 >>>>>> drwxr-xr-x - hbase supergroup 0 2012-12-10 14:39 >>>>>> /hbase/.archive/entry_duplicate/ffcdf6554accda1800e74838b67d3004 >>>>>> hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls >>>>>> /hbase/.archive/entry_duplicate/00c185bc44b6dcf85a90b83bdda4ec2e >>>>>> hadoop@node3:~/hadoop-1.0.3$ >>>>>> >>>>>> I have not lookeqd into ALL the subdirectories, but the 10 first are >>>>>> empty. >>>>>> >>>>>> I see that there is some traces on checkAndDeleteDirectory... I will >>>>>> try to activate that and see if there is more details. >>>>>> >>>>>> >>>>>> JM >>>>>> >>>>>> 2012/12/30, Ted Yu <yuzhih...@gmail.com>: >>>>>>> The exception came from this line: >>>>>>> if (file.isDir()) checkAndDeleteDirectory(file.getPath()); >>>>>>> Looking at checkAndDeleteDirectory(), it recursively deletes files >>>>>>> and >>>>>>> directories under the specified path. >>>>>>> >>>>>>> Does /hbase/.archive/entry_duplicate only contain empty directories >>>>>>> underneath it ? >>>>>>> >>>>>>> You didn't modify the logcleaner plugin setting, right ? >>>>>>> <property> >>>>>>> <name>hbase.master.logcleaner.plugins</name> >> <value>org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner</value> >>>>>>> >>>>>>> </property> >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On Sun, Dec 30, 2012 at 9:53 AM, Jean-Marc Spaggiari < >>>>>>> jean-m...@spaggiari.org> wrote: >>>>>>> >>>>>>>> I was going to move to 0.94.4 today ;) And yes I'm using 0.94.3. I >>>>>>>> might wait a bit in case some testing is required with my version. >>>>>>>> >>>>>>>> Is this what you are looking for? http://pastebin.com/N8Q0FMba >>>>>>>> >>>>>>>> I will keep the files for now since it seems it's not causing any >>>>>>>> major issue. That will allow some more testing if required. >>>>>>>> >>>>>>>> JM >>>>>>>> >>>>>>>> >>>>>>>> 2012/12/30, Ted Yu <yuzhih...@gmail.com>: >>>>>>>>> Looks like you're using 0.94.3 >>>>>>>>> >>>>>>>>> The archiver is backport of: >>>>>>>>> HBASE-5547, Don't delete HFiles in backup mode >>>>>>>>> >>>>>>>>> Can you provide more the log where the IOE was reported using >>>>> pastebin >>>>>>>>> ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Sun, Dec 30, 2012 at 9:08 AM, Jean-Marc Spaggiari < >>>>>>>>> jean-m...@spaggiari.org> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have a "IOException" /hbase/.archive/table_name is non empty >>>>>>>>>> exception every minute on my logs. >>>>>>>>>> >>>>>>>>>> There is 30 directories under this directory. the main directory >>>>>>>>>> is >>>>>>>>>> from yesterday, but all sub directories are from December 10th, >>>>>>>>>> all >>>>>>>>>> the same time. >>>>>>>>>> >>>>>>>>>> What does this .archive directory is used for, and what should I >>>>>>>>>> do? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> JM >>