One hint would be to reduce the number of writable instances you need. Create the object once and reuse it. By the way, Hive do not use Writable. ;)
Bertrand On Wed, Aug 1, 2012 at 4:35 PM, Connell, Chuck <chuck.conn...@nuance.com>wrote: > This is actually not surprising. Hive is essentially a MapReduce compiler. > It is common for regular compilers (C, C#, Fortran) to emit faster > assembler code than you write yourself. Compilers know the tricks of their > target language. > > Chuck Connell > Nuance R&D Data Team > Burlington, MA > > > -----Original Message----- > From: Yue Guan [mailto:pipeha...@gmail.com] > Sent: Wednesday, August 01, 2012 10:29 AM > To: user@hive.apache.org > Subject: mapper is slower than hive' mapper > > Hi, there > > I'm writing mapreduce to replace some hive query and I find that my mapper > is slow than hive's mapper. The Hive query is like: > > select sum(column1) from table group by column2, column3; > > My mapreduce program likes this: > > public static class HiveTableMapper extends Mapper<BytesWritable, > Text, MyKey, DoubleWritable> { > > public void map(BytesWritable key, Text value, Context context) > throws IOException, InterruptedException { > String[] sLine = StringUtils.split(value.toString(), > StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR); > context.write(new MyKey(Integer.parseInt(sLine[0]), > sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2]))); > } > > } > > I assume hive is doing something similar. Is there any trick in hive to > speed this thing up? Thank you! > > Best, > > -- Bertrand Dechoux