Re: mapper is slower than hive' mapper

Edward Capriolo Wed, 01 Aug 2012 08:14:04 -0700

As mentioned, if you avoid using new, by re-using objects and possibly
use buffer objects you may be able to match or beat the speed. But in
the general case the hive saves you time by allowing you not to worry
about low level details like this.


On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck
<chuck.conn...@nuance.com> wrote:
> This is actually not surprising. Hive is essentially a MapReduce compiler. It 
> is common for regular compilers (C, C#, Fortran) to emit faster assembler 
> code than you write yourself. Compilers know the tricks of their target 
> language.
>
> Chuck Connell
> Nuance R&D Data Team
> Burlington, MA
>
>
> -----Original Message-----
> From: Yue Guan [mailto:pipeha...@gmail.com]
> Sent: Wednesday, August 01, 2012 10:29 AM
> To: user@hive.apache.org
> Subject: mapper is slower than hive' mapper
>
> Hi, there
>
> I'm writing mapreduce to replace some hive query and I find that my mapper is 
> slow than hive's mapper. The Hive query is like:
>
> select sum(column1) from table group by column2, column3;
>
> My mapreduce program likes this:
>
>      public static class HiveTableMapper extends Mapper<BytesWritable, Text, 
> MyKey, DoubleWritable> {
>
>          public void map(BytesWritable key, Text value, Context context) 
> throws IOException, InterruptedException {
>                  String[] sLine = StringUtils.split(value.toString(),
> StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
>              context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), 
> new DoubleWritable(Double.parseDouble(sLine[2])));
>          }
>
>      }
>
> I assume hive is doing something similar. Is there any trick in hive to speed 
> this thing up? Thank you!
>
> Best,
>

Re: mapper is slower than hive' mapper

Reply via email to