One more thing is compaction, if cell is so big, then we need much IO to
read & write
the value of big cell. Actually, when compaction, key of the cell is the
key point.

On Mon, Jul 1, 2019 at 11:02 PM OpenInx <open...@gmail.com> wrote:

> Hi Vitaliy
>
> > What would be NOT conservative upper bound for such case?
> Good Question, there some reasons I can think about :
>
> 1.  The BucketCache won't cache big block (IIRC, 2MB?) by default . say if
> you have a 10MB cell, then it won't
> cache in BucketCache, it need to read disk very time. quit time consuming
> and high IO pressure.
> 2.  On the other hand, before HBase2.3.0(still unreleased) we will read
> the data block from HDFS into heap first,  only after
> RPC shipped the cells the client, the heap can be unref and GC by JVM.  If
> you have many reads which want
> to read huge cells at the same time,  you will get huge GC pressure which
> may lead to full GC or OOM. Our production cluster
> encountered this problem before, really bad thing and easy to affect the
> availability.  After HBASE-21879[1], we've made the
> HDFS block reading offheap, which means read it into pooled ByteBuffers,
> that will help a lot. You can read the document in
> apache book[2], or the design doc [3].
>
>
> [1]. https://issues.apache.org/jira/browse/HBASE-21879
> [2]. https://hbase.apache.org/book.html#offheap_read_write
> [3].
> https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit
>
> Thanks.
>
> On Mon, Jul 1, 2019 at 8:15 PM Vitaliy Semochkin <vitaliy...@gmail.com>
> wrote:
>
>> Thank you very much for the fast reply!
>>
>> What would be NOT conservative upper bound for such case?
>> Is it possible to use 500GB rows using such approach?
>>
>> Regards,
>> Vitaliy
>>
>>
>> On Fri, Jun 28, 2019 at 7:55 AM Stack <st...@duboce.net> wrote:
>> >
>> > On Wed, Jun 26, 2019 at 1:08 PM Vitaliy Semochkin <vitaliy...@gmail.com
>> >
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I have an analytical report that would be very easy to build
>> > > if I could store thousands of cells in one row each cell storing about
>> > > 2kb of information.
>> > > I don't need those rows to be stored in any cache, because they will
>> > > be used only occasionally for analytical reports in Flink.
>> > >
>> > > The question is, what is the biggest size of a row hbase can handle?
>> > > Should I store 2kb rows as MOBs or regular format is ok?
>> > >
>> > > There are old articles that say that large rows, i.e. rows which total
>> > > size is large than 10mb, can affect hbase performance,
>> > > is this statement still valid for the modern hbase versions?
>> > > What is the largest row size  hbase handle theses days without having
>> > > issues with performance?
>> > > Is it possible to read a row so that it's whole content is not read
>> > > into memory (e.g I would like to read row's content cell by cell)?
>> > >
>> > >
>> > See
>> >
>> https://hbase.apache.org/2.0/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAllowPartialResults-boolean-
>> > It speaks to your question. See the 'See Also:' on this method too.
>> >
>> > Only works for Scan. Doesn't work if you Get a row (You could Scan one
>> row
>> > only if you need the above partial result).
>> >
>> > HBase has no 'streaming' API that would allow you return a
>> Cell-at-a-time
>> > so big rows are a problem if you don't do the above partial. The big
>> row is
>> > materialized serverside in memory and then again client-side. 10MB is a
>> > conservative upper bound.
>> >
>> > 2kb Cells should work nicely -- even if a few thousand... especially if
>> you
>> > can use partial.
>> >
>> > S
>> >
>> >
>> >
>> > > Best Regards
>> > > Vitaliy
>> > >
>>
>

Reply via email to