You should be able to use a conditional writer to support 'put if absent': https://accumulo.apache.org/docs/2.x/getting-started/clients#conditionalwriter

Generally you would not want to repeatedly write the same key/value, as you will have to scan every single versioned entry when you want to read it back, which can make it much slower than you might expect to read a single row.

Thanks,

Emilio

On 4/14/20 9:48 AM, Adam J. Shook wrote:
limitVersion = false would *not* set the default VersioningIterator, effectively keeping every entry you write to Accumulo.  Sounds like it hits your requirement of "versions never to be removed", though keep in mind that your static "metadata" qualifier would also never be versioned/deleted.

On Mon, Apr 13, 2020 at 8:47 PM Niclas Hedhman <[email protected] <mailto:[email protected]>> wrote:

    Ah! I had some misunderstandings implanted in me, and good to get
    corrected.

    For

    |connector.tableOperations.create(String tableName, boolean
    limitVersion);|


    Will limitVersion=false disable versioning completely and I will
    always only have one version, or will it have a "no limit" and "no
    removal" policy of versions?

    Well, to be clear, I am looking for "versions never to be
    removed", a requirement that made me smile and remember "Accumulo
    can do that automatically", rather than implement that at a higher
    level.

    Thanks

    On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Niclas,

        1. Accumulo uses a VersioningIterator for all tables which
        ensures that you see the latest version of a particular entry,
        defined as the entry that has the highest value for the
        timestamp.  Older versions of the same key (row ID + family +
        qualifier + visibility) are compacted away by Accumulo and
        will eventually be deleted.  You can set the number of
        versions you want to keep to something other than the default
        of 1 (see
        
https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps).

        2. Related to #1, Accumulo will update the value to the latest
        version of entry.  I believe if you keep writing the same
        entry with the same data over and over again, you'll see them
        if you are keeping more than one version of the same entry. 
        AFAIK there is no "put if absent" behavior without reading for
        every write.  You can, of course, configure an existing
        iterator or write your own to achieve whatever logic you want
        as far as what versions to keep of what columns of your data
        model.

        3. The "Scanner" will return entries in order. Related to #1,
        it will only return the latest version of an entry (by
        default).  If you are keeping more versions of the same entry,
        then you would see the newest entry first.  The "BatchScanner"
        is multi-threaded and communicates to several tablets at once,
        returning entries out of order.  One common pattern is to use
        the WholeRowIterator when scanning.  This iterator serializes
        all entries with the same row into one entry on the server
        side, then you can deserialize the row on the client side to
        view the entire contents of a row at once.  The order of the
        rows themselves is still undefined when using a BatchScanner
        due to the multi-threaded nature of the scanner.

        Hope this helps!
        --Adam

        On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman
        <[email protected] <mailto:[email protected]>> wrote:

            Hi,
            I am steaming new on Accumulo, but tasked to put it into
            what used to be Apache Polygene (now in Attic) as a entity
            store, one that keeps history.

            I have a couple of questions;
            1. Assuming that I can guarantee that no one executes any
            explicit deletes, can I rely on the mutation sequences not
            disappearing over time?

            2. Part of storing a row, I have a "metadata" qualifier,
            that contains static information. But since I don't know
            whether the row exists without reading it first, then
            IIUIC I will fill the "metadata" with the same information
            over and over again.... OR, does Accumulo realize that
            this is the same byte[] as before and won't update the
            value, alternatively creating a new Key, but pointing to
            the same Value?  I effectively want a "putIfAbsent()"

            3. The Scanner can fetch multiple rows, and constrained by
            CF and qualifier. I think that is quite clear. But what
            does the iterator() actually return? I presume that it is
            many key/value paris, of ALL timestamped values. But what
            is the order guarantees here? I get the impression that
            within a row->cf->qualifier, the returned values are in
            timestamp order, newest first. And I think that within a
            row, I am guaranteed that the order maintained, i.e. row
            -> cf -> qualifier (all ascending). But am I also
            guaranteed that the iterator is "done" with a row when the
            has changed? Or can rows be interleaved in the iterator?

            Thanks in advance
            Niclas


Reply via email to