Re: best approach for write and immediate read use case

Gautam Borah Fri, 23 Aug 2013 15:41:33 -0700

Thanks Ted for your response, and clarifying the behavior for using HTable
interface.


What would be the behavior for inserting data using map reduce job? would
the recently added records be in the memstore? or I need to load them for
read queries after the insert is done?

Thanks,
Gautam


On Fri, Aug 23, 2013 at 2:43 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Assuming you are using 0.94, the default value
> for hbase.regionserver.global.memstore.lowerLimit is 0.35
>
> Meaning, memstore on each region server would be able to hold 3000M * 0.35
> / 60 = 17.5 mil records (roughly).
>
> bq. If I use HTable interface, would the inserted data be in the HBase
> cache, before flushing to the files, for immediate read queries?
>
> Yes.
>
> Cheers
>
>
> On Fri, Aug 23, 2013 at 12:01 PM, Gautam Borah <gautam.bo...@gmail.com
> >wrote:
>
> > Hi,
> >
> > Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value,
> > table has one column family.
> >
> > I have setup a cluster for testing - 1 master and 3 region servers. Each
> > have a heap size of 3 GB, single cpu.
> >
> > I have pre-split the table into 30 regions. I do not have to keep data
> > forever, I could purge older records periodically.
> >
> > Thanks,
> >
> > Gautam
> >
> >
> >
> > On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Can you tell us the average size of your records and how much heap is
> > > given to the region servers ?
> > >
> > > Thanks
> > >
> > > On Aug 23, 2013, at 12:11 AM, Gautam Borah <gautam.bo...@gmail.com>
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I have an use case where I need to write 1 million to 10 million
> > records
> > > > periodically (with intervals of 1 minutes to 10 minutes), into an
> HBase
> > > > table.
> > > >
> > > > Once the insert is completed, these records are queried immediately
> > from
> > > > another program - multiple reads.
> > > >
> > > > So, this is one massive write followed by many reads.
> > > >
> > > > I have two approaches to insert these records into the HBase table -
> > > >
> > > > Use HTable or HTableMultiplexer to stream the data to HBase table.
> > > >
> > > > or
> > > >
> > > > Write the data to HDFS store as a sequence file (avro in my case) -
> run
> > > map
> > > > reduce job using HFileOutputFormat and then load the output files
> into
> > > > HBase cluster.
> > > > Something like,
> > > >
> > > >  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> > > >  loader.doBulkLoad(new Path(outputDir), hTable);
> > > >
> > > >
> > > > In my use case which approach would be better?
> > > >
> > > > If I use HTable interface, would the inserted data be in the HBase
> > cache,
> > > > before flushing to the files, for immediate read queries?
> > > >
> > > > If I use map reduce job to insert, would the data be loaded into the
> > > HBase
> > > > cache immediately? or only the output files would be copied to
> > respective
> > > > hbase table specific directories?
> > > >
> > > > So, which approach is better for write and then immediate multiple
> read
> > > > operations?
> > > >
> > > > Thanks,
> > > > Gautam
> > >
> >
>

Re: best approach for write and immediate read use case

Reply via email to