What's the diff btw setOutputKeyComparatorClass and setOutputValueGroupingComparator?

2011-11-26 Thread Alexander Pivovarov
I tried to use one or another for secondary sort -- both options work fine -- I get combined sorted result in reduce() iterator Also I noticed that if I set both of them at the same time then KeyComparatorClass.compare(O1, O2) never called, hadoop calls only ValueGroupingComparator.compare() I r

Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Ronnie Dove
Hello Raimon,  I like the idea of being able to search through files on HDFS so that we can find keywords or timestamp criteria, something that OceanSync will be doing in the future as a tool option.  The others have told you some great ideas but I wanted to help you out from a Java API perspec

Re: Hadoop Serialization: Avro

2011-11-26 Thread Leonardo Urbina
Thanks, I will send the question to that last as well, Best, -Leo Sent from my phone On Nov 26, 2011, at 7:32 PM, Brock Noland wrote: > Hi, > > Depending on the response you get here, you might also post the > question separately on avro-user. > > On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbi

Re: Hadoop Serialization: Avro

2011-11-26 Thread Brock Noland
Hi, Depending on the response you get here, you might also post the question separately on avro-user. On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina wrote: > Hey everyone, > > First time posting to the list. I'm currently writing a hadoop job that > will run daily and whose output will be part

Hadoop Serialization: Avro

2011-11-26 Thread Leonardo Urbina
Hey everyone, First time posting to the list. I'm currently writing a hadoop job that will run daily and whose output will be part of the part of the next day's input. Also, the output will potentially be read by other programs for later analysis. Since my program's output is used as part of the

Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Harsh J
Or, if you have a fuse DFS mounted, you can just use the regular command. On 26-Nov-2011, at 10:05 PM, Prashant Sharma wrote: > Wont be that easy but its possible to write. > I did something like this. > $HADOOP_HOME/bin/hadoop fs -rmr `$HADOOP_HOME/bin/hadoop fs -ls | grep > '.*2011.11.1[1-8].*

Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Prashant Sharma
Wont be that easy but its possible to write. I did something like this. $HADOOP_HOME/bin/hadoop fs -rmr `$HADOOP_HOME/bin/hadoop fs -ls | grep '.*2011.11.1[1-8].*' | cut -f 19 -d \ ` Notice a space in -d \. -P On Sat, Nov 26, 2011 at 8:46 PM, Uma Maheswara Rao G wrote: > AFAIK, there is no fac

Re: Using FUSE for flat file loading

2011-11-26 Thread Linden Hillenbrand
What kind of staging area are they landing in? If you are looking to go direct to HDFS or even from a staging area you can look at Sqoop or Flume. FUSE works just fine, if you want more info on FUSE, I'd check out the following: http://wiki.apache.org/hadoop/MountableHDFS On Mon, Nov 21, 2011 at

RE: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Uma Maheswara Rao G
AFAIK, there is no facility like this in HDFS through command line. One option is, write small client program and collect the files from root based on your condition and invoke delete on them. Regards, Uma From: Raimon Bosch [raimon.bo...@gmail.com] Sent:

How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Raimon Bosch
Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?

Re: Distributed sorting using Hadoop

2011-11-26 Thread Prashant Sharma
Please see my mail on common-dev. Also you may not send the same mail on all mailing lists, be patient for people to reply. On Sat, Nov 26, 2011 at 6:35 PM, madhu_sushmi wrote: > > Hi, > I need to implement distributed sorting using Hadoop. I am quite new to > Hadoop and I am getting confused. I

Distributed sorting using Hadoop

2011-11-26 Thread madhu_sushmi
Hi, I need to implement distributed sorting using Hadoop. I am quite new to Hadoop and I am getting confused. If I want to implement Merge sort, what my Map and reduce should be doing. ? Should all the sorting happen at reduce side? Please help. This is an urgent requirement. Please guide me. -

Distributed sorting using Hadoop

2011-11-26 Thread madhu_sushmi
Hi, I need to implement distributed sorting using Hadoop. I am quite new to Hadoop and I am getting confused. If I want to implement Merge sort, what my Map and reduce should be doing. ? Should all the sorting happen at reduce side? Please help. This is an urgent requirement. Please guide me. T