Yup, that's the intended use case. You have the flexibility to determine
what column families make sense to group together. Your only "cost" in
changing your mind is the speed at which you can re-compact your data.
There is one concern which comes to mind. Though making many locality
groups does increase the speed at which you can read from specific
columns, it decreases the speed at which you can read from _all_
columns. So, you can do this trick to make Accumulo act more like a
columnar database, but beware that you're going to have an impact if you
still have a use-case where you read more than just one or two columns
at a time.
Does that make sense?
On 10/19/17 5:50 PM, Mohammad Kargar wrote:
AFAIK in Accumulo we can use "locality groups" to group sets of columns
together on disk which would make it more like a column-oriented
database. Considering that "locality groups" are per column family, I
was wondering what if we treat column families like column qualifiers
(creating one column family per each qualifier) and assigning each to a
different locality group. This way all the data in a given column will
be next to each other on disk which makes it easier for analytical
applications to query the data.
Any thoughts?
Thanks,
Mohammad