On Mon, Apr 24, 2017 at 8:06 PM, Jason Heo <jason.heo....@gmail.com> wrote:
> Thanks David > > Hi Mike. I'm using Kudu 1.3.0 bundled in "Cloudera Express 5.10.0 (#85 > built by jenkins on 20170120-1037 git: aa0b5cd5eceaefe2f971c13ab65702 > 0d96bb842a)" > > My concern is that something does not free up cleanly and something wastes > of my resources. eg) I dropped a 30TB table, but in tablet_data, there are > still 3TB files. And the output of "lsof" shows that tserver opens 50M > files. So I emailed to know how to remove unnecessarily files. > The leftover space usage could come from a couple of different root causes. For 1.4 we're working on tools (including the below-mentioned fs-check) to detect and repair the "orphaned" data usage. > > It seems I can't use "kudu fs check" though. > > $ kudu fs check > Invalid argument: unknown command 'check' > Usage: > /path/to/cloudera/parcels/KUDU-1.3.0-1.cdh5.11.0.p0.12/bin/../lib/kudu/bin/kudu > fs <command> [<args>] > > <command> can be one of the following: > dump Dump a Kudu filesystem > format Format a new Kudu filesystem > > Then I'll try "kudu fs check" when it will be available in Cloudera Manager > Sorry, 'fs check' is coming in 1.4, You can build the 'kudu' tool from source, though, and run it against a 1.3 cluster. -Todd > > Thanks > > 2017-04-25 3:54 GMT+09:00 Mike Percy <mpe...@apache.org>: > >> HI Jason, >> I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious >> data-loss bug related to re-replication. Please see >> https://kudu.apache.org/releases/1.3.1/docs/release_notes.html (if you >> are using the Cloudera version of 1.3.0, no need to worry because it >> includes the fix for that bug). >> >> In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to >> see if you have orphaned blocks. If you do, you could use the --repair >> argument to that tool to repair it if you bring your tablet server offline. >> >> That said, Kudu uses hole punching to delete data and the same container >> files may remain open even after removing data. After dropping tables, you >> should see disk usage at the file system level drop. >> >> I'm not sure that I've answered all your questions. If you have specific >> concerns, please let us know what you are worried about. >> >> Mike >> >> On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo <jason.heo....@gmail.com> >> wrote: >> >>> Hi. >>> >>> Before dropping, there were about 30 tables, 27,000 files in tablet_data >>> directory. >>> I dropped most tables and there is ONLY one table which has 400 tablets >>> in my test Kudu cluster. >>> After dropping, there are still 27,000 files in tablet_data directory, >>> and output of /sbin/lsof is the same before dropping. (kudu tserver >>> opens almost 50M files) >>> >>> I'm curious that this can be resolved using "kudu fs check" which is >>> available at Kudu 1.4. >>> >>> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu >>> 1.3.0 >>> >>> Regards, >>> >>> Jason >>> >>> >> > -- Todd Lipcon Software Engineer, Cloudera