You should be able to use a conditional writer to support 'put if
absent':
https://accumulo.apache.org/docs/2.x/getting-started/clients#conditionalwriter
Generally you would not want to repeatedly write the same key/value, as
you will have to scan every single versioned entry when you want to read
it back, which can make it much slower than you might expect to read a
single row.
Thanks,
Emilio
On 4/14/20 9:48 AM, Adam J. Shook wrote:
limitVersion = false would *not* set the default VersioningIterator,
effectively keeping every entry you write to Accumulo. Sounds like it
hits your requirement of "versions never to be removed", though keep
in mind that your static "metadata" qualifier would also never be
versioned/deleted.
On Mon, Apr 13, 2020 at 8:47 PM Niclas Hedhman <[email protected]
<mailto:[email protected]>> wrote:
Ah! I had some misunderstandings implanted in me, and good to get
corrected.
For
|connector.tableOperations.create(String tableName, boolean
limitVersion);|
Will limitVersion=false disable versioning completely and I will
always only have one version, or will it have a "no limit" and "no
removal" policy of versions?
Well, to be clear, I am looking for "versions never to be
removed", a requirement that made me smile and remember "Accumulo
can do that automatically", rather than implement that at a higher
level.
Thanks
On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook
<[email protected] <mailto:[email protected]>> wrote:
Hi Niclas,
1. Accumulo uses a VersioningIterator for all tables which
ensures that you see the latest version of a particular entry,
defined as the entry that has the highest value for the
timestamp. Older versions of the same key (row ID + family +
qualifier + visibility) are compacted away by Accumulo and
will eventually be deleted. You can set the number of
versions you want to keep to something other than the default
of 1 (see
https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps).
2. Related to #1, Accumulo will update the value to the latest
version of entry. I believe if you keep writing the same
entry with the same data over and over again, you'll see them
if you are keeping more than one version of the same entry.
AFAIK there is no "put if absent" behavior without reading for
every write. You can, of course, configure an existing
iterator or write your own to achieve whatever logic you want
as far as what versions to keep of what columns of your data
model.
3. The "Scanner" will return entries in order. Related to #1,
it will only return the latest version of an entry (by
default). If you are keeping more versions of the same entry,
then you would see the newest entry first. The "BatchScanner"
is multi-threaded and communicates to several tablets at once,
returning entries out of order. One common pattern is to use
the WholeRowIterator when scanning. This iterator serializes
all entries with the same row into one entry on the server
side, then you can deserialize the row on the client side to
view the entire contents of a row at once. The order of the
rows themselves is still undefined when using a BatchScanner
due to the multi-threaded nature of the scanner.
Hope this helps!
--Adam
On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman
<[email protected] <mailto:[email protected]>> wrote:
Hi,
I am steaming new on Accumulo, but tasked to put it into
what used to be Apache Polygene (now in Attic) as a entity
store, one that keeps history.
I have a couple of questions;
1. Assuming that I can guarantee that no one executes any
explicit deletes, can I rely on the mutation sequences not
disappearing over time?
2. Part of storing a row, I have a "metadata" qualifier,
that contains static information. But since I don't know
whether the row exists without reading it first, then
IIUIC I will fill the "metadata" with the same information
over and over again.... OR, does Accumulo realize that
this is the same byte[] as before and won't update the
value, alternatively creating a new Key, but pointing to
the same Value? I effectively want a "putIfAbsent()"
3. The Scanner can fetch multiple rows, and constrained by
CF and qualifier. I think that is quite clear. But what
does the iterator() actually return? I presume that it is
many key/value paris, of ALL timestamped values. But what
is the order guarantees here? I get the impression that
within a row->cf->qualifier, the returned values are in
timestamp order, newest first. And I think that within a
row, I am guaranteed that the order maintained, i.e. row
-> cf -> qualifier (all ascending). But am I also
guaranteed that the iterator is "done" with a row when the
has changed? Or can rows be interleaved in the iterator?
Thanks in advance
Niclas