Hi, there
I'm writing mapreduce to replace some hive query and I find that my
mapper is slow than hive's mapper. The Hive query is like:
select sum(column1) from table group by column2, column3;
My mapreduce program likes this:
public static class HiveTableMapper extends Mapper<BytesWritable,
Text, MyKey, DoubleWritable> {
public void map(BytesWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] sLine = StringUtils.split(value.toString(),
StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
context.write(new MyKey(Integer.parseInt(sLine[0]),
sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
}
}
I assume hive is doing something similar. Is there any trick in hive to
speed this thing up? Thank you!
Best,