Forgot the link. github.com/edwardcapriolo/filecrush
On 6/1/12, Edward Capriolo <[email protected]> wrote: > The filecrush tool has a small utility called Clean that accepts and > age argument and deletes all the files in a directory older then a > certain time. > > We use clean to clean up the tmp hdfs directories applications leave > remnants in. > > Edward > > On 6/1/12, Vinod Singh <[email protected]> wrote: >> Yes, that is how I do. Though 1 month is too long, I keep it just 2 days. >> >> Thanks, >> Vinod >> >> http://blog.vinodsingh.com/ >> >> On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries >> <[email protected]>wrote: >> >>> So I should write a job which cleans up 1 month old results or something >>> like that? >>> >>> From: Vinod Singh [mailto:[email protected]] >>> Sent: Friday, June 01, 2012 10:35 AM >>> To: [email protected] >>> Subject: Re: Hive scratch dir not cleaning up >>> >>> Hive deletes job contents from the scratch directory on completion of >>> the >>> job. Though failed / killed jobs leave data there, which needs to be >>> removed manually. >>> >>> Thanks, >>> Vinod >>> >>> http://blog.vinodsingh.com/ >>> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <[email protected]> >>> wrote: >>> Hey Hivers, >>> >>> I’m almost ready to replace our old hadoop implementation with a >>> implementation using Hive, >>> >>> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir >>> is >>> getting kinda big! >>> It doesn’t seem to cleanup this tmp files, googling for it I run into >>> some >>> tickets about a cleanup setting, should I enable this with the below >>> setting? >>> Why doesn’t it do that by default? Am I the only one somehow racking up >>> a >>> lot of space with tmp files? >>> >>> >>> >>> >>> <property> >>> <name>hive.start.cleanup.scratchdir</name> >>> <value>true</value> >>> </property> >>> >>> >> >
