Hi Lune > My question is the following : will I encounter the famous "small files > problem" with my namenodes because of the number of small audit files stored > in HDFS ?
Based on your environment, you will have 134 files per day, which will be around 4000 files per month. Which I feel, shouldn’t be an issues with NameNode. > My question is the following : will I encounter the famous "small files > problem" with my namenodes because of the number of small audit files stored > in HDFS ? Yes, there is a property to set the duration. You can use “file.rollover.sec” with the destination prefix. > Is there a way to purge them natively (I developed a shell script and > scheduled it on crontab but if there is a native mechanism) ? There is no native feature available. But it would be good to have. Would you want to contribute the shell script for others to use? Eventually, we could have an Oozie job to which could do a few things, e.g. compress it, coalesce multiple files, purge it or even create Hive tables out of the files. Thanks Bosco From: Lune Silver <lunescar.ran...@gmail.com> Reply-To: <user@ranger.incubator.apache.org> Date: Tuesday, July 12, 2016 at 11:40 AM To: <user@ranger.incubator.apache.org> Subject: About the audit stored in HDFS Hello everyone ! I send you this mail about a question related to the storage HDFS of the audit. I use Ranger for three plugins first : - HDFS - Kafka - HBase I have two namenodes, two Hbase-masters, 100 region servers and 30 kafka brokers. I notices that I have ony audit file per server per day. My question is the following : will I encounter the famous "small files problem" with my namenodes because of the number of small audit files stored in HDFS ? Is there a way to configure the frequence when the audit are put into HDFS ? Or a way to configure ranger to store files corresponding to multiple days ? Is there a way to purge them natively (I developed a shell script and scheduled it on crontab but if there is a native mechanism) ? BR. Lune