Re: How to delete files older than X days in HDFS/Hadoop

2011-11-27 Thread Bill Graham
If you're able to put your data in directories named by date (i.e. MMdd), you can take advantage of the fact that the HDFS client will return directories in sort order of the name, which returns the most recent dirs last. You can then cron a bash script that deletes all the but last N

RE: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Uma Maheswara Rao G
AFAIK, there is no facility like this in HDFS through command line. One option is, write small client program and collect the files from root based on your condition and invoke delete on them. Regards, Uma From: Raimon Bosch [raimon.bo...@gmail.com] Sent:

Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Prashant Sharma
Wont be that easy but its possible to write. I did something like this. $HADOOP_HOME/bin/hadoop fs -rmr `$HADOOP_HOME/bin/hadoop fs -ls | grep '.*2011.11.1[1-8].*' | cut -f 19 -d \ ` Notice a space in -d \SPACE. -P On Sat, Nov 26, 2011 at 8:46 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote:

Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Ronnie Dove
Hello Raimon,  I like the idea of being able to search through files on HDFS so that we can find keywords or timestamp criteria, something that OceanSync will be doing in the future as a tool option.  The others have told you some great ideas but I wanted to help you out from a Java API