Re: Add Columnsize Filter for Scan Operation

Dhaval Shah Thu, 24 Oct 2013 10:16:06 -0700

Interesting!! Can't wait to see this in action. I am already imagining huge 
performance gains
 
Regards,
Dhaval



________________________________
 From: Ted Yu <yuzhih...@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>; Dhaval Shah 
<prince_mithi...@yahoo.co.in> 
Sent: Thursday, 24 October 2013 1:06 PM
Subject: Re: Add Columnsize Filter for Scan Operation
 

For streaming responses, there is this JIRA:

HBASE-8691 High-Throughput Streaming Scan API



On Thu, Oct 24, 2013 at 9:53 AM, Dhaval Shah <prince_mithi...@yahoo.co.in>wrote:

> Jean, if we don't add setBatch to the scan, MR job does cause HBase to
> crash due to OOME. We have run into this in the past as well. Basically the
> problem is - Say I have a region server with 12GB of RAM and a row of size
> 20GB (an extreme example, in practice, HBase runs out of memory way before
> 20GB). If I query the entire row, HBase does not have enough memory to
> hold/process it for the response.
>
> In practice, if your setCaching > 1, then the aggregate of all rows
> growing too big can also cause the same issue.
>
> I think 1 way we can solve this issue is making the HBase server serve
> responses in a streaming fashion somehow (not exactly sure about the
> details on how this can work but if it has to hold the entire row in
> memory, its going to be bound by HBase heap size)
>
> Regards,
> Dhaval
>
>
> ________________________________
>  From: Jean-Marc Spaggiari <jean-m...@spaggiari.org>
> To: user <user@hbase.apache.org>
> Sent: Thursday, 24 October 2013 12:37 PM
> Subject: Re: Add Columnsize Filter for Scan Operation
>
>
> If the MR crash because of the number of columns, then we have an issue
> that we need to fix ;) Please open a JIRA provide details if you are facing
> that.
>
> Thanks,
>
> JM
>
>
>
> 2013/10/24 John <johnnyenglish...@gmail.com>
>
> > @Jean-Marc: Sure, I can do that, but thats a little bit complicated
> because
> > the the rows has sometimes Millions of Columns and I have to handle them
> > into different batches because otherwise hbase crashs. Maybe I will try
> it
> > later, but first I want to try the API version. It works okay so far,
> but I
> > want to improve it a little bit.
> >
> > @Ted: I try to modify it, but I have no idea how exactly do this. I've to
> > count the number of columns in that filter (that works obviously with the
> > count field). But there is no Method that is caleld after iterating over
> > all elements, so I can not return the Drop ReturnCode in the
> filterKeyValue
> > Method because I did'nt know when it was the last one. Any ideas?
> >
> > regards
> >
> >
> > 2013/10/24 Ted Yu <yuzhih...@gmail.com>
> >
> > > Please take a look
> > > at
> > src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
> > >
> > >  * Simple filter that returns first N columns on row only.
> > >
> > > You can modify the filter to suit your needs.
> > >
> > > Cheers
> > >
> > >
> > > On Thu, Oct 24, 2013 at 7:52 AM, John <johnnyenglish...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm write currently a HBase Java programm which iterates over every
> row
> > > in
> > > > a table. I have to modiy some rows if the column size (the amount of
> > > > columns in this row) is bigger than 25000.
> > > >
> > > > Here is my sourcode: http://pastebin.com/njqG6ry6
> > > >
> > > > Is there any way to add a Filter to the scan Operation and load only
> > rows
> > > > where the size is bigger than 25k?
> > > >
> > > > Currently I check the size at the client, but therefore I have to
> load
> > > > every row to the client site. It would be better if the wrong rows
> > > already
> > > > filtered at the "server" site.
> > > >
> > > > thanks
> > > >
> > > > John
> > > >
> > >
> >
>

Re: Add Columnsize Filter for Scan Operation

Reply via email to