Hi, In our case it turned out to be co-processors. More specifically, thanks to Logsene <http://sematext.com/logsene> we would that one of our co-processors logged some exceptions on start. Once we fixed those errors we stopped having issues with growing disk usage. Sorry I don't have more details, but maybe this helps somebody.
Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On Thu, Oct 29, 2015 at 1:52 PM, Stack <st...@duboce.net> wrote: > Are you printing out filesize (I don't see the -s arg on lsof). > St.Ack > > On Fri, Oct 23, 2015 at 8:08 PM, Otis Gospodnetić < > otis.gospodne...@gmail.com> wrote: > > > Hi Ted, > > > > 0.98.6-cdh5.3.0 > > > > I did actually try to use lsof, but I didn't see anything unusual there. > > Is there something specific I should look for? Things owned by hbase > user > > or hdfs or yarn? Hm, here, I don't really see anything interesting > > > > $ sudo lsof| grep '/mnt' <== this is where all data lives and where disk > > usage drops after RS restart > > > > java 2654 hdfs 1w REG 202,16 89487 > > 44042562 > > > > > /mnt/hadoop-hdfs/log/hadoop-hdfs-datanode-spm-hbase-slave11.prod.sematext.out > > java 2654 hdfs 2w REG 202,16 89487 > > 44042562 > > > > > /mnt/hadoop-hdfs/log/hadoop-hdfs-datanode-spm-hbase-slave11.prod.sematext.out > > java 2654 hdfs 286w REG 202,16 108938205 > > 44044137 > > > > > /mnt/hadoop-hdfs/log/hadoop-hdfs-datanode-spm-hbase-slave11.prod.sematext.log > > java 2654 hdfs 289w REG 202,16 0 > > 44040203 /mnt/hadoop-hdfs/log/SecurityAuth-hdfs.audit > > java 2654 hdfs 314w REG 202,16 261462 > > 44040213 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/dncp_block_verification.log.curr > > java 2654 hdfs 316r REG 202,16 134217728 > > 44045060 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir74/subdir58/blk_1078606358 > > java 2654 hdfs 318r REG 202,16 134217728 > > 44057015 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir74/subdir224/blk_1078648930 > > java 2654 hdfs 319uW REG 202,16 36 > > 44042741 /mnt/hadoop-hdfs/data/in_use.lock > > java 2654 hdfs 321r REG 202,16 1048583 > > 44042793 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir7/blk_1078658889_4918820.meta > > java 2654 hdfs 330u REG 202,16 352563 > > 44048279 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078675432_4935363.meta > > java 2654 hdfs 333r REG 202,16 134217728 > > 44055769 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir9/blk_1078659381 > > java 2654 hdfs 335u REG 202,16 45127168 > > 44048273 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078675432 > > java 2654 hdfs 340r REG 202,16 134217728 > > 44042791 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir7/blk_1078658889 > > java 2654 hdfs 343r REG 202,16 13882119 > > 44048053 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir71/blk_1078675385 > > java 2654 hdfs 345u REG 202,16 485059 > > 44048209 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078675399_4935330.meta > > java 2654 hdfs 346r REG 202,16 134217728 > > 44053723 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir4/blk_1078658098 > > java 2654 hdfs 347u REG 202,16 371455 > > 44047931 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078675364_4935295.meta > > java 2654 hdfs 348u REG 202,16 47545282 > > 44047927 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078675364 > > java 2654 hdfs 354u REG 202,16 20386405 > > 44047875 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir8/blk_1078659266 > > java 2654 hdfs 355r REG 202,16 134217728 > > 44042762 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir74/subdir243/blk_1078653797 > > java 2654 hdfs 357r REG 202,16 134217728 > > 44042535 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir66/blk_1078674123 > > java 2654 hdfs 359u REG 202,16 1839 > > 44045445 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078674506_4934437.meta > > java 2654 hdfs 360u REG 202,16 234130 > > 44045440 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078674506 > > java 2654 hdfs 363r REG 202,16 20629437 > > 44046774 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir17/blk_1078661533 > > java 2654 hdfs 369r REG 202,16 18304945 > > 44047599 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir71/blk_1078675270 > > java 2654 hdfs 370r REG 202,16 62086413 > > 44048199 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/rbw/blk_1078675399 > > java 2654 hdfs 379r REG 202,16 134217728 > > 44050035 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir3/blk_1078657983 > > java 2654 hdfs 390u REG 202,16 20857780 > > 44050270 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir8/blk_1078659267 > > java 2654 hdfs 408r REG 202,16 115453375 > > 44042299 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir66/blk_1078674120 > > java 2654 hdfs 415r REG 202,16 20253192 > > 44053520 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir60/blk_1078672624 > > java 2654 hdfs 423r REG 202,16 18382878 > > 44047547 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir71/blk_1078675257 > > java 2654 hdfs 424r REG 202,16 19555559 > > 44040692 > > > > > /mnt/hadoop-hdfs/data/current/BP-282774069-10.123.212.150-1419335230604/current/finalized/subdir75/subdir65/blk_1078673801 > > bash 15005 ec2-user cwd DIR 202,16 4096 > > 2 /mnt > > sudo 16055 root cwd DIR 202,16 4096 > > 2 /mnt > > grep 16056 ec2-user cwd DIR 202,16 4096 > > 2 /mnt > > sed 16057 ec2-user cwd DIR 202,16 4096 > > 2 /mnt > > lsof 16058 root cwd DIR 202,16 4096 > > 2 /mnt > > lsof 16059 root cwd DIR 202,16 4096 > > 2 /mnt > > bash 18748 hbase 1w REG 202,16 12843 > > 4980744 > > > /mnt/hbase/log/hbase-hbase-regionserver-spm-hbase-slave11.prod.sematext.out > > bash 18748 hbase 2w REG 202,16 12843 > > 4980744 > > > /mnt/hbase/log/hbase-hbase-regionserver-spm-hbase-slave11.prod.sematext.out > > java 18761 hbase 1w REG 202,16 12843 > > 4980744 > > > /mnt/hbase/log/hbase-hbase-regionserver-spm-hbase-slave11.prod.sematext.out > > java 18761 hbase 2w REG 202,16 12843 > > 4980744 > > > /mnt/hbase/log/hbase-hbase-regionserver-spm-hbase-slave11.prod.sematext.out > > java 18761 hbase 338w REG 202,16 117537786 > > 4980753 > > > /mnt/hbase/log/hbase-hbase-regionserver-spm-hbase-slave11.prod.sematext.log > > java 18761 hbase 339w REG 202,16 0 > > 4980741 /mnt/hbase/log/SecurityAuth.audit > > java 29057 yarn 1w REG 202,16 130105 > > 51380228 > > > > > /mnt/hadoop-yarn/log/yarn-yarn-nodemanager-spm-hbase-slave11.prod.sematext.out > > java 29057 yarn 2w REG 202,16 130105 > > 51380228 > > > > > /mnt/hadoop-yarn/log/yarn-yarn-nodemanager-spm-hbase-slave11.prod.sematext.out > > java 29057 yarn 286w REG 202,16 103611255 > > 51380852 > > > > > /mnt/hadoop-yarn/log/yarn-yarn-nodemanager-spm-hbase-slave11.prod.sematext.log > > > > I don't see anything big there... > > > > Thanks, > > Otis > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > On Fri, Oct 23, 2015 at 10:26 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > Which specific release of 0.98 are you using ? > > > > > > Have you used lsof to see which files were being held onto ? > > > > > > Thanks > > > > > > On Fri, Oct 23, 2015 at 7:21 PM, Otis Gospodnetić < > > > otis.gospodne...@gmail.com> wrote: > > > > > > > Hello, > > > > > > > > Is/was there a known issue with HBase 0.98 "holding onto" files? > > > > > > > > We noticed the used disk space metric going up, up and up and we > could > > > not > > > > stop it with major compaction. > > > > But we noticed that if we restart a RegionServer 2 things happen: > > > > 1) its disk usage immediately drops a lot > > > > 2) the disk usage of other RegionServers drops some as well > > > > > > > > Have a look at this chart: > > > > https://apps.sematext.com/spm-reports/s/Ssy4ViFGHq > > > > > > > > At 1:54 we restarted the first RS (blue line) > > > > At 2:03 we restarted the second RS (dark green line) > > > > > > > > Is/was this a known HBase 0.98 issue? > > > > > > > > Thanks, > > > > Otis > > > > -- > > > > Monitoring - Log Management - Alerting - Anomaly Detection > > > > Solr & Elasticsearch Consulting Support Training - > > http://sematext.com/ > > > > > > > > > >