Hi Vitaliy

> What would be NOT conservative upper bound for such case?
Good Question, there some reasons I can think about :

1.  The BucketCache won't cache big block (IIRC, 2MB?) by default . say if
you have a 10MB cell, then it won't
cache in BucketCache, it need to read disk very time. quit time consuming
and high IO pressure.
2.  On the other hand, before HBase2.3.0(still unreleased) we will read the
data block from HDFS into heap first,  only after
RPC shipped the cells the client, the heap can be unref and GC by JVM.  If
you have many reads which want
to read huge cells at the same time,  you will get huge GC pressure which
may lead to full GC or OOM. Our production cluster
encountered this problem before, really bad thing and easy to affect the
availability.  After HBASE-21879[1], we've made the
HDFS block reading offheap, which means read it into pooled ByteBuffers,
that will help a lot. You can read the document in
apache book[2], or the design doc [3].


[1]. https://issues.apache.org/jira/browse/HBASE-21879
[2]. https://hbase.apache.org/book.html#offheap_read_write
[3].
https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit

Thanks.

On Mon, Jul 1, 2019 at 8:15 PM Vitaliy Semochkin <vitaliy...@gmail.com>
wrote:

> Thank you very much for the fast reply!
>
> What would be NOT conservative upper bound for such case?
> Is it possible to use 500GB rows using such approach?
>
> Regards,
> Vitaliy
>
>
> On Fri, Jun 28, 2019 at 7:55 AM Stack <st...@duboce.net> wrote:
> >
> > On Wed, Jun 26, 2019 at 1:08 PM Vitaliy Semochkin <vitaliy...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have an analytical report that would be very easy to build
> > > if I could store thousands of cells in one row each cell storing about
> > > 2kb of information.
> > > I don't need those rows to be stored in any cache, because they will
> > > be used only occasionally for analytical reports in Flink.
> > >
> > > The question is, what is the biggest size of a row hbase can handle?
> > > Should I store 2kb rows as MOBs or regular format is ok?
> > >
> > > There are old articles that say that large rows, i.e. rows which total
> > > size is large than 10mb, can affect hbase performance,
> > > is this statement still valid for the modern hbase versions?
> > > What is the largest row size  hbase handle theses days without having
> > > issues with performance?
> > > Is it possible to read a row so that it's whole content is not read
> > > into memory (e.g I would like to read row's content cell by cell)?
> > >
> > >
> > See
> >
> https://hbase.apache.org/2.0/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAllowPartialResults-boolean-
> > It speaks to your question. See the 'See Also:' on this method too.
> >
> > Only works for Scan. Doesn't work if you Get a row (You could Scan one
> row
> > only if you need the above partial result).
> >
> > HBase has no 'streaming' API that would allow you return a Cell-at-a-time
> > so big rows are a problem if you don't do the above partial. The big row
> is
> > materialized serverside in memory and then again client-side. 10MB is a
> > conservative upper bound.
> >
> > 2kb Cells should work nicely -- even if a few thousand... especially if
> you
> > can use partial.
> >
> > S
> >
> >
> >
> > > Best Regards
> > > Vitaliy
> > >
>

Reply via email to