mapper is slower than hive' mapper

Yue Guan Wed, 01 Aug 2012 07:29:27 -0700

Hi, there

I'm writing mapreduce to replace some hive query and I find that mymapper is slow than hive's mapper. The Hive query is like:


select sum(column1) from table group by column2, column3;

My mapreduce program likes this:

public static class HiveTableMapper extends Mapper<BytesWritable,Text, MyKey, DoubleWritable> {

public void map(BytesWritable key, Text value, Context context)throws IOException, InterruptedException {String[] sLine = StringUtils.split(value.toString(),StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);context.write(new MyKey(Integer.parseInt(sLine[0]),sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));

        }

    }

I assume hive is doing something similar. Is there any trick in hive tospeed this thing up? Thank you!


Best,

mapper is slower than hive' mapper

Reply via email to