There are several files generated. I suppose there 20 files because its a
setting for hbase to have 10gb files.
03.10.2014 1:01 пользователь "Jean-Marc Spaggiari" <[email protected]>
написал:

> If it's a sigle 200GB file, when HBase will spit this region,this file will
> have to be splitted and re-written into 2 x 100GB files.
>
> How is the file generated? You should really think about splitting it
> first...
>
> 2014-10-02 15:49 GMT-04:00 Jerry He <[email protected]>:
>
> > The reference files will be rewritten during compaction, which normally
> > happens right after splits.
> >
> > You did not mention if your 200gb data is one file,or many hfiles.
> >
> > Jerry
> > On Oct 2, 2014 12:26 PM, "Serega Sheypak" <[email protected]>
> > wrote:
> >
> > > Sorry, massive IO.
> > > This table is read-only. So hbase should just place reference files,
> why
> > > Hbase would rewrite the files?
> > >
> > > 2014-10-02 23:24 GMT+04:00 Serega Sheypak <[email protected]>:
> > >
> > > > Hi!
> > > >
> http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
> > > > Says that splitting is just a placing 'reference' file.
> > > > Why there sould be massive splitting?
> > > >
> > > > 2014-10-02 23:08 GMT+04:00 Jean-Marc Spaggiari <
> > [email protected]
> > > >:
> > > >
> > > >> Hi Serega,
> > > >>
> > > >> Bulk load just "push" the file into an HBase region, so there should
> > not
> > > >> be
> > > >> any issue. Split however might take some time because HBase will
> have
> > to
> > > >> split it again and again util it become small enough. So if you max
> > file
> > > >> size is 10GB, it will split it to 100GB then 50GB then 25GB then
> 12GB
> > > then
> > > >> 6GB... Each time, everything will be re-written. a LOT of wasted
> IOs.
> > > >>
> > > >> So response is: Yes, HBase can handle BUT it's not a good practice.
> > > Better
> > > >> to split the table before and generate the bulk based on the splited
> > > >> regions. Also, it might affect the others tables and the
> performances
> > > >> because HBase will have to do massive IOs, which at the end might
> > impact
> > > >> the performances.
> > > >>
> > > >> JM
> > > >>
> > > >> 2014-10-02 15:03 GMT-04:00 Serega Sheypak <[email protected]
> >:
> > > >>
> > > >> > Hi, I'm doing HBase bulk load to an empty table.
> > > >> > Input data size is 200GB
> > > >> > Is it OK to load data into one default region and then wait while
> > > HBase
> > > >> > splits 200GB region?
> > > >> >
> > > >> > I don't have any SLA for initial load. I can wait unitl HBase
> splits
> > > >> > initial load files.
> > > >> > This table is READ only.
> > > >> >
> > > >> > The only conideration is not affect others tables and do not cause
> > > HBase
> > > >> > cluster degradation.
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to