By generalizing my question, in a M/R job, reading/writing from/to the
same Hbase table,
is there any way to work with any desirable number of reducers importing
data to a table,
which has its regions presplitted & specified?
Thanks in advance and excuse me for re-asking for help.
-------- Original Message --------
Subject: Re: Bulk load - #Reducers different from #Regions
Date: Tue, 07 Aug 2012 20:02:27 +0300
From: Ioakim Perros <imper...@gmail.com>
To: user@hbase.apache.org
Excuse me for not well-defining.
I am bulk updating my hbase table through code, using
configureIncrementalLoad function of HFileOutputFormat. At the
respective documentation, I read that this function " Sets the number of
reduce tasks to match the current number of regions" ,but I was
wondering if I could explicitly avoid it, perhaps by another way of bulk
importing data.
PS: I try to insist on bulk importing, because I have understood (I hope
that this is correct), that it is much more efficient than going with
the traditional Hbase API. And as I require my job to be of iterative
nature, this way hopefully would end up giving a good boost-up, as
opposed to the Hbase API.
Thank you for responding.
On 08/07/2012 07:53 PM, Subir S wrote:
Bulk load using
ImportTsv with pre-splitted regions for target table?
Do u mean to set number of reducers that ImportTsv must use?
On 8/7/12, Ioakim Perros<imper...@gmail.com> wrote:
HI,
I am bulk importing (updating) data iteratively and I would like to be
able to set the number of reducers at a M/R task, to be different from
the number of regions of the table to which I am updating data.
I tried it through job.setNumReduceTasks(#reducers), but the job ignored
it.
Is there a way to avoid an intermediary job and to set the number of
reducers explicitly ?
I would be grateful if anyone could shed a light to this.
Thanks and regards,
Ioakim