By generalizing my question, in a M/R job, reading/writing from/to the same Hbase table,

is there any way to work with any desirable number of reducers importing data to a table,

which has its regions presplitted & specified?



Thanks in advance and excuse me for re-asking for help.



-------- Original Message --------
Subject:        Re: Bulk load - #Reducers different from #Regions
Date:   Tue, 07 Aug 2012 20:02:27 +0300
From:   Ioakim Perros <imper...@gmail.com>
To:     user@hbase.apache.org



Excuse me for not well-defining.

I am bulk updating my hbase table through code, using configureIncrementalLoad function of HFileOutputFormat. At the respective documentation, I read that this function " Sets the number of reduce tasks to match the current number of regions" ,but I was wondering if I could explicitly avoid it, perhaps by another way of bulk importing data.

PS: I try to insist on bulk importing, because I have understood (I hope that this is correct), that it is much more efficient than going with the traditional Hbase API. And as I require my job to be of iterative nature, this way hopefully would end up giving a good boost-up, as opposed to the Hbase API.

Thank you for responding.



On 08/07/2012 07:53 PM, Subir S wrote:
Bulk load using
ImportTsv with pre-splitted regions for target table?

Do u mean to set number of reducers that ImportTsv must use?

On 8/7/12, Ioakim Perros<imper...@gmail.com>  wrote:
HI,

I am bulk importing (updating) data iteratively and I would like to be
able to set the number of reducers at a M/R task, to be different from
the number of regions of the table to which I am updating data.

I tried it through job.setNumReduceTasks(#reducers), but the job ignored
it.

Is there a way to avoid an intermediary job and to set the number of
reducers explicitly ?
I would be grateful if anyone could shed a light to this.

Thanks and regards,
Ioakim






Reply via email to