Okay, thanks for the information and your time it has been very helpful. V/r, -Daniel
-----Original Message----- From: Josh Elser [mailto:[email protected]] Sent: Friday, August 14, 2015 10:04 AM To: [email protected] Subject: Re: Fetch Taking Longer Than Expected "Small" might also be misleading. A locality group can have be a good way to separate a large collection of data from an actually small number of other records. Discrete yes, but the data itself does not need to be small to put it into a locality group. Christopher wrote: > I would be surprised if anybody has tested more than a dozen or two > locality groups or placed more than a dozen or two column families in > any one locality group. > > > On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <[email protected] > <mailto:[email protected]>> wrote: > > Thanks...We landed up doing just that. Correct having a bunch of > random data does not fit well with locality groups. I did have > another question though you mentioned a "small discrete set". What > would you consider small? Would you recommend for example against > having several thousand locality groups in a table? > > V/r, > -Daniel > -----Original Message----- > From: Christopher [mailto:[email protected] > <mailto:[email protected]>] > Sent: Wednesday, August 12, 2015 3:08 PM > To: Accumulo User List <[email protected] > <mailto:[email protected]>> > Subject: Re: Fetch Taking Longer Than Expected > > The schema shown above doesn't quite look like it's well-suited for > locality groups, though. The CF field looks like it's a composition of > an attribute name and that attribute's value. To take advantage of > locality groups with that schema, you'd have to have a locality group > for every attribute name/value combination, which would probably not > work well. > > If you want to take advantage of locality groups, you'll probably want > to make your CFs a small, discrete set (like just attribute names). > So, if you push the attribute value into the CQ, you could at the very > least limit your search to the locality containing the particular > attribute name you are searching for. > > If you really want efficient searches based on attribute name/value > combinations, you're going to want to put this up the row (at the > beginning of your row), so your data is ordered (indexed) by that. You > could do this in a secondary index (which could be in a different > table, a different segment of this table, or in a separate locality > group in this table). > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > > On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <[email protected] > <mailto:[email protected]>> wrote: > > Yup, that would be expected. > > > > Remember that doing `scan -c ...` is an unbounded search over > your entire > > table. So, it takes approximately 3 minutes to read your > GUIDIndexTable. > > Because you have a single locality group, all of the columns in > your table > > are grouped together. > > > > One exercise that may be interesting for yourself is to create a > locality > > group that has your specific column family in it, compact your > > GUIDIndexTable, and rerun your `scan -c` query. The speed should > be similar > > to your exact scan. Removing the locality group and re-compacting > the table > > should return the query time back to the slow 3 minutes. > > > > Does that make sense? > > > > Daniel Ruiz wrote: > >> > >> Hi All, > >> > >> I am having an issue where column fetches are taking over a > minute on > >> 1.6.3. I don’t believe this should be case and my experience in > the past > >> supports the idea that fetches should be very fast. > >> > >> For example we doing a scan on the table gives results instantly but > >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 > seconds > >> (plus or minus 1 second). > >> > >> Figure 1.1. Generated Test Data on GUIDIndexTable > >> > >> Here is the table config > >> > >> > >> > > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > >> > >> SCOPE | NAME | VALUE > >> > >> > >> > > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > >> > >> default | table.balancer > .............................................. > >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer > >> > >> default | table.bloom.enabled > ......................................... > >> | false > >> > >> default | table.bloom.error.rate > ...................................... > >> | 0.5% > >> > >> default | table.bloom.hash.type > ....................................... > >> | murmur > >> > >> default | table.bloom.key.functor > ..................................... > >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor > >> > >> default | table.bloom.load.threshold > .................................. | > >> 1 > >> > >> default | table.bloom.size > ............................................ > >> | 1048576 > >> > >> default | table.cache.block.enable > .................................... > >> | false > >> > >> default | table.cache.index.enable > .................................... > >> | true > >> > >> default | table.classpath.context > ..................................... | > >> > >> default | table.compaction.major.everything.idle > ...................... | > >> 1h > >> > >> default | table.compaction.major.ratio > ................................ | > >> 3 > >> > >> default | table.compaction.minor.idle > ................................. | > >> 5m > >> > >> default | table.compaction.minor.logs.threshold > ....................... | > >> 3 > >> > >> table | table.constraint.1 > .......................................... | > >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint > >> > >> default | table.failures.ignore > ....................................... > >> | false > >> > >> default | table.file.blocksize > ........................................ | > >> 0B > >> > >> default | table.file.compress.blocksize > ............................... > >> | 100K > >> > >> default | table.file.compress.blocksize.index > ......................... > >> | 128K > >> > >> default | table.file.compress.type > .................................... | > >> gz > >> > >> default | table.file.max > .............................................. | > >> 15 > >> > >> default | table.file.replication > ...................................... | > >> 0 > >> > >> default | table.file.type > ............................................. | > >> rf > >> > >> default | table.formatter > ............................................. > >> | org.apache.accumulo.core.util.format.DefaultFormatter > >> > >> default | table.groups.enabled > ........................................ | > >> > >> default | table.interepreter > .......................................... > >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter > >> > >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable > .......... | > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > >> > >> table | > table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > >> 2592000000 > >> > >> table | table.iterator.majc.vers > .................................... | > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > >> > >> table | table.iterator.majc.vers.opt.maxVersions > .................... | 1 > >> > >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable > .......... | > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > >> > >> table | > table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > >> 2592000000 > >> > >> table | table.iterator.minc.vers > .................................... | > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > >> > >> table | table.iterator.minc.vers.opt.maxVersions > .................... | 1 > >> > >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable > .......... | > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > >> > >> table | > table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > >> 2592000000 > >> > >> ---------------------------------------------------------- hit > any key > >> to continue or 'q' to quit > >> ---------------------------------------------------------- > >> > >> table | table.iterator.scan.vers > .................................... | > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > >> > >> table | table.iterator.scan.vers.opt.maxVersions > .................... | 1 > >> > >> default | table.majc.compaction.strategy > .............................. > >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy > >> > >> default | table.scan.max.memory > ....................................... > >> | 512K > >> > >> table | @override > ................................................ | 1M > >> > >> default | table.security.scan.visibility.default > ...................... | > >> > >> default | table.split.threshold > ....................................... | > >> 1G > >> > >> default | table.walog.enabled > ......................................... > >> | true > >> > >> > >> > > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > >> > >> More Table Info: > >> > >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f> > >> > >> > >> > >> ONLINE > >> > >> > >> > >> 2 > >> > >> > >> > >> 0 > >> > >> > >> > >> 82.56M > >> > >> > >> > >> 810.00K > >> > >> > >> > >> 159 > >> > >> Please let me know if I am doing something wrong to if there is more > >> information you need. > >> > >> V/r, > >> > >> -Daniel > >> > > >
