One hint would be to reduce the number of writable instances you need.
Create the object once and reuse it.
By the way, Hive do not use Writable. ;)

Bertrand

On Wed, Aug 1, 2012 at 4:35 PM, Connell, Chuck <chuck.conn...@nuance.com>wrote:

> This is actually not surprising. Hive is essentially a MapReduce compiler.
> It is common for regular compilers (C, C#, Fortran) to emit faster
> assembler code than you write yourself. Compilers know the tricks of their
> target language.
>
> Chuck Connell
> Nuance R&D Data Team
> Burlington, MA
>
>
> -----Original Message-----
> From: Yue Guan [mailto:pipeha...@gmail.com]
> Sent: Wednesday, August 01, 2012 10:29 AM
> To: user@hive.apache.org
> Subject: mapper is slower than hive' mapper
>
> Hi, there
>
> I'm writing mapreduce to replace some hive query and I find that my mapper
> is slow than hive's mapper. The Hive query is like:
>
> select sum(column1) from table group by column2, column3;
>
> My mapreduce program likes this:
>
>      public static class HiveTableMapper extends Mapper<BytesWritable,
> Text, MyKey, DoubleWritable> {
>
>          public void map(BytesWritable key, Text value, Context context)
> throws IOException, InterruptedException {
>                  String[] sLine = StringUtils.split(value.toString(),
> StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
>              context.write(new MyKey(Integer.parseInt(sLine[0]),
> sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
>          }
>
>      }
>
> I assume hive is doing something similar. Is there any trick in hive to
> speed this thing up? Thank you!
>
> Best,
>
>


-- 
Bertrand Dechoux

Reply via email to