Ok. I found a clean way to improve that a lot without going with the
filter. I will open a JIRA and push a fix.
The idea is to set the caching to the maximum of LIMIT, so we don't read
the entire table before returning to the shell. Also, we have to change
where we do the test.
anyway. JIRA 13721
For PageFilter :
* Implementation of Filter interface that limits results to a specific page
* size. It terminates scanning once the number of filter-passed rows is >
* the given page size.
In your case, what should be the page size - 0 ?
Cheers
On Tue, May 19, 2015 at 8:30 PM, Jean-Marc S
Oh, I see! So basically we do a full table scan because it never returns a
2nd row, so we never reach that break and we exit only when we reach the
end of the table. Therefore the same performances without the limit
parameter...
Should we then try to add a filter like PageFilter to the scan if we
Take a look at table.rb _scan_internal()
LIMIT is not passed to the server, so you fetch more rows
https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/hbase/table.rb#L495
Matteo
On Tue, May 19, 2015 at 8:11 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> I tried to
I tried to run scan/get/scan/get many times, and always the same pattern.
You can remove the "LIMIT => 1" parameter and you will get the same
performances.
Scan and get without the QC returns in very similar time. 191ms for one,
194ms for the other one.
2015-05-19 23:02 GMT-04:00 Ted Yu :
> J-M:
J-M:
How many times did you try the pair of queries ?
Since scan was run first, this would give the get query some advantage,
right ?
Cheers
On Tue, May 19, 2015 at 7:34 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> Are not Scan and Gets supposed to be almost as fast?
>
> I have a
C’mon, really?
Do they really return the same results?
Let me put it this way… are you walking through the same code path?
> On May 19, 2015, at 10:34 PM, Jean-Marc Spaggiari
> wrote:
>
> Are not Scan and Gets supposed to be almost as fast?
>
> I have a pretty small table with 65K lines,
Are not Scan and Gets supposed to be almost as fast?
I have a pretty small table with 65K lines, few columns (hundred?) trying
to go a get and a scan.
hbase(main):009:0> scan 'sensors', { COLUMNS =>
['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW => '000a', LIMIT => 1 }
ROW
COLUMN+CELL
000
Thanks Ian! Very helpful breakdown.
For this use case, I think the multi-version row structure is ruled out.
We will investigate the onekey-manycolumn approach. Also, the more I study
the mechanics behind a SCAN vs GET, the more I believe the informal test I
did is inaccurate. What does
t; is straight up rude; please don't do that.
From: Neil Yalowitz mailto:neilyalow...@gmail.com>>
Date: Tue, Oct 16, 2012 at 2:53 PM
Subject: crafting your key - scan vs. get
To: user@hbase.apache.org<mailto:user@hbase.apache.org>
Hopefully this is a fun question. :)
Assume you could
n the first case, you can use get() while still a scan, its a very
> efficient fetch.
>> In the second, you will always need to do a scan.
>
> This is the core of my original question. My anecdotal tests in hbase
> shell showed a Get executing about 3x faster than a Scan
-
> > AAmyval1 1350345600
> > AAmyval2 1350259200
> > AAmyval3 1350172800
> >
> > Retrieving these values will use a Get with VERSIONS = somebignumber. In
> > hbase shell, it would look like:
> >
> > $ get 'mytable','AA',{COLUMN=>'cf:mycf', VERSIONS=>999}
> >
> > ...although this probably violates a comment in the HBase documentation:
> >
> > "It is not recommended setting the number of max versions to an
> exceedingly
> > high level (e.g., hundreds or more) unless those old values are very dear
> > to you because this will greatly increase StoreFile size."
> >
> > ...found here: http://hbase.apache.org/book/schema.versions.html
> >
> >
> > So, are there any performance considerations between Scan vs. Get in this
> > use case? Which choice would you go for?
> >
> >
> >
> > Neil Yalowitz
> > neilyalow...@gmail.com
>
>
although this probably violates a comment in the HBase documentation:
>
> "It is not recommended setting the number of max versions to an exceedingly
> high level (e.g., hundreds or more) unless those old values are very dear
> to you because this will greatly increase StoreFile size."
>
> ...found here: http://hbase.apache.org/book/schema.versions.html
>
>
> So, are there any performance considerations between Scan vs. Get in this
> use case? Which choice would you go for?
>
>
>
> Neil Yalowitz
> neilyalow...@gmail.com
ill greatly increase StoreFile size."
...found here: http://hbase.apache.org/book/schema.versions.html
So, are there any performance considerations between Scan vs. Get in this
use case? Which choice would you go for?
Neil Yalowitz
neilyalow...@gmail.com
14 matches
Mail list logo