Scan vs Get

2015-05-19 Thread Jean-Marc Spaggiari
Are not Scan and Gets supposed to be almost as fast? I have a pretty small table with 65K lines, few columns (hundred?) trying to go a get and a scan. hbase(main):009:0> scan 'sensors', { COLUMNS => ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW => '000a', LIMIT => 1 } ROW COLUMN+CELL 000

Re: Scan vs Get

2015-05-19 Thread Michael Segel
C’mon, really? Do they really return the same results? Let me put it this way… are you walking through the same code path? > On May 19, 2015, at 10:34 PM, Jean-Marc Spaggiari > wrote: > > Are not Scan and Gets supposed to be almost as fast? > > I have a pretty small table with 65K lines,

Re: Scan vs Get

2015-05-19 Thread Ted Yu
J-M: How many times did you try the pair of queries ? Since scan was run first, this would give the get query some advantage, right ? Cheers On Tue, May 19, 2015 at 7:34 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Are not Scan and Gets supposed to be almost as fast? > > I have a

Re: Scan vs Get

2015-05-19 Thread Jean-Marc Spaggiari
I tried to run scan/get/scan/get many times, and always the same pattern. You can remove the "LIMIT => 1" parameter and you will get the same performances. Scan and get without the QC returns in very similar time. 191ms for one, 194ms for the other one. 2015-05-19 23:02 GMT-04:00 Ted Yu : > J-M:

Re: Scan vs Get

2015-05-19 Thread Matteo Bertozzi
Take a look at table.rb _scan_internal() LIMIT is not passed to the server, so you fetch more rows https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/hbase/table.rb#L495 Matteo On Tue, May 19, 2015 at 8:11 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > I tried to

Re: Scan vs Get

2015-05-19 Thread Jean-Marc Spaggiari
Oh, I see! So basically we do a full table scan because it never returns a 2nd row, so we never reach that break and we exit only when we reach the end of the table. Therefore the same performances without the limit parameter... Should we then try to add a filter like PageFilter to the scan if we

Re: Scan vs Get

2015-05-19 Thread Ted Yu
For PageFilter : * Implementation of Filter interface that limits results to a specific page * size. It terminates scanning once the number of filter-passed rows is > * the given page size. In your case, what should be the page size - 0 ? Cheers On Tue, May 19, 2015 at 8:30 PM, Jean-Marc S

Re: Scan vs Get

2015-05-20 Thread Jean-Marc Spaggiari
Ok. I found a clean way to improve that a lot without going with the filter. I will open a JIRA and push a fix. The idea is to set the caching to the maximum of LIMIT, so we don't read the entire table before returning to the shell. Also, we have to change where we do the test. anyway. JIRA 13721

crafting your key - scan vs. get

2012-10-16 Thread Neil Yalowitz
ill greatly increase StoreFile size." ...found here: http://hbase.apache.org/book/schema.versions.html So, are there any performance considerations between Scan vs. Get in this use case? Which choice would you go for? Neil Yalowitz neilyalow...@gmail.com

Re: crafting your key - scan vs. get

2012-10-17 Thread Michael Segel
although this probably violates a comment in the HBase documentation: > > "It is not recommended setting the number of max versions to an exceedingly > high level (e.g., hundreds or more) unless those old values are very dear > to you because this will greatly increase StoreFile size." > > ...found here: http://hbase.apache.org/book/schema.versions.html > > > So, are there any performance considerations between Scan vs. Get in this > use case? Which choice would you go for? > > > > Neil Yalowitz > neilyalow...@gmail.com

Re: crafting your key - scan vs. get

2012-10-17 Thread Neil Yalowitz
- > > AAmyval1 1350345600 > > AAmyval2 1350259200 > > AAmyval3 1350172800 > > > > Retrieving these values will use a Get with VERSIONS = somebignumber. In > > hbase shell, it would look like: > > > > $ get 'mytable','AA',{COLUMN=>'cf:mycf', VERSIONS=>999} > > > > ...although this probably violates a comment in the HBase documentation: > > > > "It is not recommended setting the number of max versions to an > exceedingly > > high level (e.g., hundreds or more) unless those old values are very dear > > to you because this will greatly increase StoreFile size." > > > > ...found here: http://hbase.apache.org/book/schema.versions.html > > > > > > So, are there any performance considerations between Scan vs. Get in this > > use case? Which choice would you go for? > > > > > > > > Neil Yalowitz > > neilyalow...@gmail.com > >

Re: crafting your key - scan vs. get

2012-10-18 Thread Michael Segel
n the first case, you can use get() while still a scan, its a very > efficient fetch. >> In the second, you will always need to do a scan. > > This is the core of my original question. My anecdotal tests in hbase > shell showed a Get executing about 3x faster than a Scan

Re: crafting your key - scan vs. get

2012-10-18 Thread Ian Varley
t; is straight up rude; please don't do that. From: Neil Yalowitz mailto:neilyalow...@gmail.com>> Date: Tue, Oct 16, 2012 at 2:53 PM Subject: crafting your key - scan vs. get To: user@hbase.apache.org<mailto:user@hbase.apache.org> Hopefully this is a fun question. :) Assume you could

Re: crafting your key - scan vs. get

2012-10-19 Thread Neil Yalowitz
Thanks Ian! Very helpful breakdown. For this use case, I think the multi-version row structure is ruled out. We will investigate the onekey-manycolumn approach. Also, the more I study the mechanics behind a SCAN vs GET, the more I believe the informal test I did is inaccurate. What does