Re: Storage structure question

Aaron D. Mihalik Tue, 22 Nov 2016 06:07:41 -0800

Good questions.  Putting all of the information in the Row ID seems like a
common pattern for composite indices in Accumulo, but I went back to the
original Rya Paper [1] to pull out the reasoning:

"""
All the data for the triple resides in the Accumulo Row ID. This offers
several benefits: 1) by using a direct string representation, we can do
direct range scans on the literals; 2) the format is very easy to serialize
and deserialize, which provides for faster query and ingest; 3) since no
information needs to be stored in the Column Family, Qualifier, or Value
fields of the Accumulo tables, the storage requirements for the triples are
significantly reduced.
"""

As for overriding the storage mechanism/pattern, you'd probably have to
write your own TripleRowResolver [2].  There a slide deck here [3] that
show's the different layers of Rya, and that might be a good place to start.

What sort of storage scheme are you considering?

--Aaron

[1] https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
[2]
https://github.com/apache/incubator-rya/blob/master/common/rya.api/src/main/java/org/apache/rya/api/resolver/triple/TripleRowResolver.java
[3] https://cwiki.apache.org/confluence/display/RYA/Rya+Office+Hours (see
"Running Through Rya Examples")

On Mon, Nov 21, 2016 at 11:28 PM Greg Clark <grs...@gmail.com> wrote:

> As I understand, Rya stores triples entirely in the row id in Accumulo.
> Why?
>
> Are entries stored this way to avoid overloading rows?
>
> Is the storage configurable; that is, could I set a flag to allow triples
> to be stored with one part of the triple in the row id, one part in the
> column family, and the remaining one in the column qualifier?  Or perhaps
> set a configuration to use a different storage approach?
>

Re: Storage structure question

Reply via email to