For #1, as Ted mentioned HDFS replication will work just fine with bulk
loads.  What you may have read is that bulk loaded data won't be picked up
by HBase replication.  If you are using HBase replication to send data to
another cluster, then you need to also manage getting the bulk loaded data
to that cluster yourself.

Dave

On Tue, Jul 21, 2015 at 7:29 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> For #1, with HDFS replication set to 3, HFile replication is handled by
> hdfs. There shouldn't be HFile loss once bulk load completes.
>
> For #3, multiple HFiles may be generated per region.
>
> bq. If multiple does loadIncrementalHFiles merges these Hfiles to 1
>
> There is no merging of HFiles in bulk load.
>
> For #4, frequent compactions are likely given the small size of bulk loaded
> data.
>
> Cheers
>
> On Tue, Jul 21, 2015 at 7:20 AM, Shushant Arora <shushantaror...@gmail.com
> >
> wrote:
>
> > 1.Does bulk loaded HFile not  get replicated? Is it mean if a
> Regionserver
> > gets down , all Hfiles which were bulk loaded to this server are lost
> > irrespective of HDFS replication set to 3 ? if yes- Why bulk loaded
> HFiles
> > are not replicated.
> >
> > 2.Is there any issue in timestamp prefix as key of table- and used bulk
> > load for writing.
> >
> > 3.Does in bulk load MR job using HFileOutPutFormat2 as outputformat will
> > create single HFile per region ? Or it can be multiple Hfiles per region?
> > If multiple does loadIncrementalHFiles merges these Hfiles to 1 while
> > loading to same region or just do simple copy?
> >
> > 4.Is there any performance issue if I run bulk load every 5 sec -
> > containing ~20MB of data.Does it  creates frequent compactions and that
> > lead to performance issue?
> >
>

Reply via email to