On Mon, Oct 11, 2010 at 9:33 PM, Sean Bigdatafun <[email protected]> wrote: > Another potential "problem" of incremental bulk loader is that the number of > reducers (for the bulk loading process) needs to be equal to the existing > regions -- this seems to be unfeasible for very large table, say with 2000 > regions. > > Any comment on this? Thanks.
Yes, this is currently problematic if you have a very large table (2000 regions) and a small MR cluster (where 2000 reducers is too many). It wouldn't be too difficult to amend the code so that each reducer is responsible for a contiguous range of regions, and knows the split the HFiles at region boundaries. Patches welcome :) -Todd > > Sean > > On Fri, Oct 8, 2010 at 9:03 PM, Todd Lipcon <[email protected]> wrote: > >> What version are you building from? These tools are new as of this past >> june. >> >> -Todd >> >> On Fri, Oct 8, 2010 at 4:52 PM, Leo Alekseyev <[email protected]> wrote: >> >> > We want to investigate HBase bulk imports, as described on >> > http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html and and/or >> > JIRA HBASE-48. I can't seem to run either the importtsv tool or the >> > completebulkload tool using the hadoop jar /path/to/hbase-VERSION.jar >> > command. In fact, the ImportTsv class is not part of that jar file. >> > Am I looking in the wrong place for this class, or do I need to >> > somehow customize the build process to include it?.. Our HBase was >> > built from source using the default procedure. >> > >> > Thanks for any insight, >> > --Leo >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > -- Todd Lipcon Software Engineer, Cloudera
