One more thing is compaction, if cell is so big, then we need much IO to read & write the value of big cell. Actually, when compaction, key of the cell is the key point.
On Mon, Jul 1, 2019 at 11:02 PM OpenInx <open...@gmail.com> wrote: > Hi Vitaliy > > > What would be NOT conservative upper bound for such case? > Good Question, there some reasons I can think about : > > 1. The BucketCache won't cache big block (IIRC, 2MB?) by default . say if > you have a 10MB cell, then it won't > cache in BucketCache, it need to read disk very time. quit time consuming > and high IO pressure. > 2. On the other hand, before HBase2.3.0(still unreleased) we will read > the data block from HDFS into heap first, only after > RPC shipped the cells the client, the heap can be unref and GC by JVM. If > you have many reads which want > to read huge cells at the same time, you will get huge GC pressure which > may lead to full GC or OOM. Our production cluster > encountered this problem before, really bad thing and easy to affect the > availability. After HBASE-21879[1], we've made the > HDFS block reading offheap, which means read it into pooled ByteBuffers, > that will help a lot. You can read the document in > apache book[2], or the design doc [3]. > > > [1]. https://issues.apache.org/jira/browse/HBASE-21879 > [2]. https://hbase.apache.org/book.html#offheap_read_write > [3]. > https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit > > Thanks. > > On Mon, Jul 1, 2019 at 8:15 PM Vitaliy Semochkin <vitaliy...@gmail.com> > wrote: > >> Thank you very much for the fast reply! >> >> What would be NOT conservative upper bound for such case? >> Is it possible to use 500GB rows using such approach? >> >> Regards, >> Vitaliy >> >> >> On Fri, Jun 28, 2019 at 7:55 AM Stack <st...@duboce.net> wrote: >> > >> > On Wed, Jun 26, 2019 at 1:08 PM Vitaliy Semochkin <vitaliy...@gmail.com >> > >> > wrote: >> > >> > > Hi, >> > > >> > > I have an analytical report that would be very easy to build >> > > if I could store thousands of cells in one row each cell storing about >> > > 2kb of information. >> > > I don't need those rows to be stored in any cache, because they will >> > > be used only occasionally for analytical reports in Flink. >> > > >> > > The question is, what is the biggest size of a row hbase can handle? >> > > Should I store 2kb rows as MOBs or regular format is ok? >> > > >> > > There are old articles that say that large rows, i.e. rows which total >> > > size is large than 10mb, can affect hbase performance, >> > > is this statement still valid for the modern hbase versions? >> > > What is the largest row size hbase handle theses days without having >> > > issues with performance? >> > > Is it possible to read a row so that it's whole content is not read >> > > into memory (e.g I would like to read row's content cell by cell)? >> > > >> > > >> > See >> > >> https://hbase.apache.org/2.0/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAllowPartialResults-boolean- >> > It speaks to your question. See the 'See Also:' on this method too. >> > >> > Only works for Scan. Doesn't work if you Get a row (You could Scan one >> row >> > only if you need the above partial result). >> > >> > HBase has no 'streaming' API that would allow you return a >> Cell-at-a-time >> > so big rows are a problem if you don't do the above partial. The big >> row is >> > materialized serverside in memory and then again client-side. 10MB is a >> > conservative upper bound. >> > >> > 2kb Cells should work nicely -- even if a few thousand... especially if >> you >> > can use partial. >> > >> > S >> > >> > >> > >> > > Best Regards >> > > Vitaliy >> > > >> >