If you're able to put your data in directories named by date (i.e.
MMdd), you can take advantage of the fact that the HDFS client will
return directories in sort order of the name, which returns the most recent
dirs last. You can then cron a bash script that deletes all the but last N
AFAIK, there is no facility like this in HDFS through command line.
One option is, write small client program and collect the files from root based
on your condition and invoke delete on them.
Regards,
Uma
From: Raimon Bosch [raimon.bo...@gmail.com]
Sent:
Wont be that easy but its possible to write.
I did something like this.
$HADOOP_HOME/bin/hadoop fs -rmr `$HADOOP_HOME/bin/hadoop fs -ls | grep
'.*2011.11.1[1-8].*' | cut -f 19 -d \ `
Notice a space in -d \SPACE.
-P
On Sat, Nov 26, 2011 at 8:46 PM, Uma Maheswara Rao G
mahesw...@huawei.comwrote:
Hello Raimon,
I like the idea of being able to search through files on HDFS so that we can
find keywords or timestamp criteria, something that OceanSync will be doing in
the future as a tool option. The others have told you some great ideas but I
wanted to help you out from a Java API