Hi

I am reading data from raw xml files and inserting data into hbase using
TableOutputFormat in a map reduce job. but due to heavy put statements, it
takes many hours to process the data. here is my sample code.

conf.set(TableOutputFormat.OUTPUT_TABLE, "mytable");
    conf.set("xmlinput.start", "<adc>");
    conf.set("xmlinput.end", "</adc>");
    conf
        .set(
          "io.serializations",

"org.apache.hadoop.io.serializer.JavaSerialization,org.apache.hadoop.io.serializer.WritableSerialization");

      Job job = new Job(conf, "Populate Table with Data");

    FileInputFormat.setInputPaths(job, input);
    job.setJarByClass(ParserDriver.class);
    job.setMapperClass(MyParserMapper.class);
    job.setNumReduceTasks(0);
    job.setInputFormatClass(XmlInputFormat.class);
    job.setOutputFormatClass(TableOutputFormat.class);


*and mapper code*

public class MyParserMapper   extends
    Mapper<LongWritable, Text, NullWritable, Writable> {

    @Override
    public void map(LongWritable key, Text value1,Context context)

throws IOException, InterruptedException {
*//doing some processing*
 while(rItr.hasNext())
                    {
*                   //and this put statement runs for 132,622,560 times to
insert the data.*
                    context.write(NullWritable.get(), new
Put(rowId).add(Bytes.toBytes("CounterValues"),
Bytes.toBytes(counter.toString()), Bytes.toBytes(rElement.getTextTrim())));

                    }

}}

Is there any other way of doing this task so i can improve the performance?


-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Reply via email to