Hi

In the version that you were using by default the caching was 1000 ( I
believe) need to see the old code.  So in that case it was trying to fetch
1000 rows and each row with 20k cols.  Now when you are saying that the
client was missing rows, did you check the server logs?

Did you get any OutOfOrderScannerException?  There is something called
'client.rpc.timeout' which can be increased in your case - but provided
your caching and batching is adjusted.

In the current trunk code - there is no default caching value (unless
specified), the server tries to fetch 2MB of data and that is sent back to
the client.
In any case I would suggest to check your server logs for any Exceptions.
Increase the timeout property and adjust your caching and batching to fetch
the data.  If still the client is missing out on rows then we need the logs
and analyse things.  Ted's mail referring to
https://issues.apache.org/jira/browse/HBASE-11544 will give an idea of the
general behaviour with scans and how it affects scanning bigger and wider
rows.

Regards
Ram


On Thu, Sep 24, 2015 at 2:32 PM, Gaurav Agarwal <gau...@arkin.net> wrote:

> Hi,
>
> The problem that I am actually facing is that when doing a scan over rows
> where each row has very large number of cells (large number of columns),
> the scan API seems to be transparently dropping data - in my case I noticed
> that entire row of data was missing in few cases.
>
> On suggestions from Ram(above), I tried doing *scan.setCaching(1)* and
> optionally,* scan.setBatch(5000)* and the problem got resolved (at least
> for now).  So this indicates that the client (cannot be server I hope) was
> dropping the cells if the number (or maybe bytes) of cells became quite
> large across number of rows cached. Note that in my case, the number of
> bytes per cell is close to 30B (including qualifier,value and timestamp)
> and each row key is close to 20B.
>
> I am not clear what setting controls the maximum number/bytes of cells that
> can be received by the client before this problem surfaces. Can someone
> please point me these settings/code?
>
> On Thu, Sep 24, 2015 at 12:05 PM, Gaurav Agarwal <gau...@arkin.net> wrote:
>
> > After spending more time I realised that my understanding and my question
> > (was invalid).
> > I am still trying to get more information regarding the problem and will
> > update the thread once I have a better handle on the problem.
> >
> > Apologies for the confusion..
> >
> > On Thu, Sep 24, 2015 at 10:32 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> >> Am not sure whether you have tried it. the scan API has got an API
> called
> >> 'batching'. Did you try it?  So per row if there are more columns you
> can
> >> still limit the amount of data being sent to the client. I think the
> main
> >> issue you are facing is that the qualifiers getting returned are more in
> >> number and so the client is not able to accept them?
> >>
> >> 'Short.MAX_VALUE which is 32,767 bytes.'
> >> This comment applies for the qualifier length ie. the name that you
> >> specify
> >> for the qualifier not on the number of qualifiers.
> >>
> >> Regards
> >> Ram
> >>
> >> On Thu, Sep 24, 2015 at 8:52 AM, Anoop John <anoop.hb...@gmail.com>
> >> wrote:
> >>
> >> > >>I have Column Family with very large number of column qualifiers (>
> >> > 50,000). Each column qualifier is 8 bytes long.
> >> >
> >> > When u say u have 50000 qualifiers in a CF, means u will have those
> many
> >> > cells coming under that CF per row.  So am not getting what is the
> >> > qualifier length limit as such coming. Per qualifier, you will have a
> >> diff
> >> > cell and its qualifier.
> >> >
> >> > -Anoop-
> >> >
> >> >
> >> > On Thu, Sep 24, 2015 at 1:13 AM, Vladimir Rodionov <
> >> vladrodio...@gmail.com
> >> > >
> >> > wrote:
> >> >
> >> > > Yes, the comment is incorrect.
> >> > >
> >> > > hbase.client.keyvalue.maxsize controls max key-value size, but its
> >> > > unlimited in a master (I was wrong about 1MB, this is probably for
> >> older
> >> > > versions of HBase)
> >> > >
> >> > >
> >> > > -Vlad
> >> > >
> >> > > On Wed, Sep 23, 2015 at 11:45 AM, Gaurav Agarwal <gau...@arkin.net>
> >> > wrote:
> >> > >
> >> > > > Thanks Vlad. Could you please point me the KV size setting
> (default
> >> > 1MB)?
> >> > > > Just to make sure that I understand correct, are you suggesting
> that
> >> > the
> >> > > > following comment is incorrect in Cell.java?
> >> > > >
> >> > > >  /**
> >> > > >    * Contiguous raw bytes that may start at any index in the
> >> containing
> >> > > > array. Max length is
> >> > > >    * Short.MAX_VALUE which is 32,767 bytes.
> >> > > >    * @return The array containing the qualifier bytes.
> >> > > >    */
> >> > > >   byte[] getQualifierArray();
> >> > > >
> >> > > > On Thu, Sep 24, 2015 at 12:10 AM, Gaurav Agarwal <
> gau...@arkin.net>
> >> > > wrote:
> >> > > >
> >> > > > > Thanks Vlad. Could you please point me the KV size setting
> >> (default
> >> > > 1MB)?
> >> > > > > Just to make sure that I understand correct - the following
> >> comment
> >> > is
> >> > > > > incorrect in Cell.java:
> >> > > > >
> >> > > > >  /**
> >> > > > >    * Contiguous raw bytes that may start at any index in the
> >> > containing
> >> > > > > array. Max length is
> >> > > > >    * Short.MAX_VALUE which is 32,767 bytes.
> >> > > > >    * @return The array containing the qualifier bytes.
> >> > > > >    */
> >> > > > >   byte[] getQualifierArray();
> >> > > > >
> >> > > > > On Wed, Sep 23, 2015 at 11:43 PM, Vladimir Rodionov <
> >> > > > > vladrodio...@gmail.com> wrote:
> >> > > > >
> >> > > > >> Check KeyValue class (Cell's implementation).
> getQualifierArray()
> >> > > > returns
> >> > > > >> kv's backing array. There is no SHORT limit on a size of this
> >> array,
> >> > > but
> >> > > > >> there are other limits in  HBase - maximum KV size, for
> example,
> >> > which
> >> > > > is
> >> > > > >> configurable, but, by default, is 1MB. Having 50K qualifiers
> is a
> >> > bad
> >> > > > >> idea.
> >> > > > >> Consider redesigning your data model and use rowkey instead.
> >> > > > >>
> >> > > > >> -Vlad
> >> > > > >>
> >> > > > >> On Wed, Sep 23, 2015 at 10:24 AM, Ted Yu <yuzhih...@gmail.com>
> >> > wrote:
> >> > > > >>
> >> > > > >> > Please take a look at HBASE-11544 which is in hbase 1.1
> >> > > > >> >
> >> > > > >> > Cheers
> >> > > > >> >
> >> > > > >> > On Wed, Sep 23, 2015 at 10:18 AM, Gaurav Agarwal <
> >> > gau...@arkin.net>
> >> > > > >> wrote:
> >> > > > >> >
> >> > > > >> > > Hi All,
> >> > > > >> > >
> >> > > > >> > > I have Column Family with very large number of column
> >> qualifiers
> >> > > (>
> >> > > > >> > > 50,000). Each column qualifier is 8 bytes long. The problem
> >> is
> >> > the
> >> > > > >> when I
> >> > > > >> > > do a scan operation to fetch some rows, the client side
> Cell
> >> > > object
> >> > > > >> does
> >> > > > >> > > not have enough space allocated in it to hold all the
> >> > > > columnQaulifiers
> >> > > > >> > for
> >> > > > >> > > a given row and hence I cannot read all the columns back
> for
> >> a
> >> > > given
> >> > > > >> row.
> >> > > > >> > >
> >> > > > >> > > Please see the code snippet that I am using:
> >> > > > >> > >
> >> > > > >> > >  final ResultScanner rs = htable.getScanner(scan);
> >> > > > >> > >  for (Result row = rs.next(); row != null; row =
> rs.next()) {
> >> > > > >> > >     final Cell[] cells = row.rawCells();
> >> > > > >> > >     if (cells != null) {
> >> > > > >> > >         for (final Cell cell : cells) {
> >> > > > >> > >             final long c = Bytes.toLong(
> >> > > > >> > >                     *cell.getQualifierArray()*,
> >> > > > >> > cell.getQualifierOffset(),
> >> > > > >> > > cell.getQualifierLength());
> >> > > > >> > >             final long v =
> Bytes.toLong(cell.getValueArray(),
> >> > > > >> > > cell.getValueOffset());
> >> > > > >> > >             points.put(c, v);
> >> > > > >> > >         }
> >> > > > >> > >     }
> >> > > > >> > > }
> >> > > > >> > >
> >> > > > >> > > The cell.getQualifierArray() method says that it's 'Max
> >> length
> >> > is
> >> > > > >> > > Short.MAX_VALUE which is 32,767 bytes.'. Hence it can only
> >> hold
> >> > > > around
> >> > > > >> > > 4,000 columnQualfiers.
> >> > > > >> > >
> >> > > > >> > > Is there an alternate API that I should be using or am I
> >> missing
> >> > > > some
> >> > > > >> > > setting here? Note that in worst case I need to read all
> the
> >> > > > >> > > columnQualifiers in a row and I may or may not know a
> subset
> >> to
> >> > > > fetch
> >> > > > >> in
> >> > > > >> > > advance.
> >> > > > >> > >
> >> > > > >> > > Even if this is not possible in a single call, is there a
> >> way to
> >> > > > >> cursor
> >> > > > >> > > through the columnQualifiers?
> >> > > > >> > >
> >> > > > >> > > I am presently using Hbase 0.96 client but can switch to
> >> Hbase
> >> > 1.x
> >> > > > if
> >> > > > >> > there
> >> > > > >> > > is an API in the newer version.
> >> > > > >> > >
> >> > > > >> > > --cheers, gaurav
> >> > > > >> > >
> >> > > > >> > > --
> >> > > > >> > > --cheers, gaurav
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > --cheers, gaurav
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > --cheers, gaurav
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > --cheers, gaurav
> >
>
>
>
> --
> --cheers, gaurav
>

Reply via email to