No, I was not recommending locality groups as a solution to the problem,
but using them to illustrate why the query was taking a long time.
do() and observe slow
change config
do() and observe fast
I was not completely clear that I was not recommending use of locality
groups as a solution to slow scans. The solution is to not do an
unbounded `scan -c` and expect it to be fast.
Christopher wrote:
The schema shown above doesn't quite look like it's well-suited for
locality groups, though. The CF field looks like it's a composition of
an attribute name and that attribute's value. To take advantage of
locality groups with that schema, you'd have to have a locality group
for every attribute name/value combination, which would probably not
work well.
If you want to take advantage of locality groups, you'll probably want
to make your CFs a small, discrete set (like just attribute names).
So, if you push the attribute value into the CQ, you could at the very
least limit your search to the locality containing the particular
attribute name you are searching for.
If you really want efficient searches based on attribute name/value
combinations, you're going to want to put this up the row (at the
beginning of your row), so your data is ordered (indexed) by that. You
could do this in a secondary index (which could be in a different
table, a different segment of this table, or in a separate locality
group in this table).
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser<[email protected]> wrote:
Yup, that would be expected.
Remember that doing `scan -c ...` is an unbounded search over your entire
table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
Because you have a single locality group, all of the columns in your table
are grouped together.
One exercise that may be interesting for yourself is to create a locality
group that has your specific column family in it, compact your
GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
to your exact scan. Removing the locality group and re-compacting the table
should return the query time back to the slow 3 minutes.
Does that make sense?
Daniel Ruiz wrote:
Hi All,
I am having an issue where column fetches are taking over a minute on
1.6.3. I don’t believe this should be case and my experience in the past
supports the idea that fetches should be very fast.
For example we doing a scan on the table gives results instantly but
doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
(plus or minus 1 second).
Figure 1.1. Generated Test Data on GUIDIndexTable
Here is the table config
-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
SCOPE | NAME | VALUE
-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
default | table.balancer ..............................................
| org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
default | table.bloom.enabled .........................................
| false
default | table.bloom.error.rate ......................................
| 0.5%
default | table.bloom.hash.type .......................................
| murmur
default | table.bloom.key.functor .....................................
| org.apache.accumulo.core.file.keyfunctor.RowFunctor
default | table.bloom.load.threshold .................................. |
1
default | table.bloom.size ............................................
| 1048576
default | table.cache.block.enable ....................................
| false
default | table.cache.index.enable ....................................
| true
default | table.classpath.context ..................................... |
default | table.compaction.major.everything.idle ...................... |
1h
default | table.compaction.major.ratio ................................ |
3
default | table.compaction.minor.idle ................................. |
5m
default | table.compaction.minor.logs.threshold ....................... |
3
table | table.constraint.1 .......................................... |
org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
default | table.failures.ignore .......................................
| false
default | table.file.blocksize ........................................ |
0B
default | table.file.compress.blocksize ...............................
| 100K
default | table.file.compress.blocksize.index .........................
| 128K
default | table.file.compress.type .................................... |
gz
default | table.file.max .............................................. |
15
default | table.file.replication ...................................... |
0
default | table.file.type ............................................. |
rf
default | table.formatter .............................................
| org.apache.accumulo.core.util.format.DefaultFormatter
default | table.groups.enabled ........................................ |
default | table.interepreter ..........................................
| org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000
table | table.iterator.majc.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.majc.vers.opt.maxVersions .................... | 1
table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000
table | table.iterator.minc.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.minc.vers.opt.maxVersions .................... | 1
table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000
---------------------------------------------------------- hit any key
to continue or 'q' to quit
----------------------------------------------------------
table | table.iterator.scan.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.scan.vers.opt.maxVersions .................... | 1
default | table.majc.compaction.strategy ..............................
| org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
default | table.scan.max.memory .......................................
| 512K
table | @override ................................................ | 1M
default | table.security.scan.visibility.default ...................... |
default | table.split.threshold ....................................... |
1G
default | table.walog.enabled .........................................
| true
-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
More Table Info:
GUIDIndexTable<http://107.23.12.24:50095/tables?t=f>
ONLINE
2
0
82.56M
810.00K
159
Please let me know if I am doing something wrong to if there is more
information you need.
V/r,
-Daniel