Ah! I had some misunderstandings implanted in me, and good to get corrected.
For connector.tableOperations.create(String tableName, boolean limitVersion); Will limitVersion=false disable versioning completely and I will always only have one version, or will it have a "no limit" and "no removal" policy of versions? Well, to be clear, I am looking for "versions never to be removed", a requirement that made me smile and remember "Accumulo can do that automatically", rather than implement that at a higher level. Thanks On Tue, Apr 14, 2020 at 12:55 AM Adam J. Shook <[email protected]> wrote: > Hi Niclas, > > 1. Accumulo uses a VersioningIterator for all tables which ensures that > you see the latest version of a particular entry, defined as the entry that > has the highest value for the timestamp. Older versions of the same key > (row ID + family + qualifier + visibility) are compacted away by Accumulo > and will eventually be deleted. You can set the number of versions you > want to keep to something other than the default of 1 (see > https://accumulo.apache.org/1.9/accumulo_user_manual.html#_versioning_iterators_and_timestamps > ). > > 2. Related to #1, Accumulo will update the value to the latest version of > entry. I believe if you keep writing the same entry with the same data > over and over again, you'll see them if you are keeping more than one > version of the same entry. AFAIK there is no "put if absent" behavior > without reading for every write. You can, of course, configure an existing > iterator or write your own to achieve whatever logic you want as far as > what versions to keep of what columns of your data model. > > 3. The "Scanner" will return entries in order. Related to #1, it will > only return the latest version of an entry (by default). If you are > keeping more versions of the same entry, then you would see the newest > entry first. The "BatchScanner" is multi-threaded and communicates to > several tablets at once, returning entries out of order. One common > pattern is to use the WholeRowIterator when scanning. This iterator > serializes all entries with the same row into one entry on the server side, > then you can deserialize the row on the client side to view the entire > contents of a row at once. The order of the rows themselves is still > undefined when using a BatchScanner due to the multi-threaded nature of the > scanner. > > Hope this helps! > --Adam > > On Mon, Apr 13, 2020 at 12:57 AM Niclas Hedhman <[email protected]> wrote: > >> Hi, >> I am steaming new on Accumulo, but tasked to put it into what used to be >> Apache Polygene (now in Attic) as a entity store, one that keeps history. >> >> I have a couple of questions; >> 1. Assuming that I can guarantee that no one executes any explicit >> deletes, can I rely on the mutation sequences not disappearing over time? >> >> 2. Part of storing a row, I have a "metadata" qualifier, that contains >> static information. But since I don't know whether the row exists without >> reading it first, then IIUIC I will fill the "metadata" with the same >> information over and over again.... OR, does Accumulo realize that this is >> the same byte[] as before and won't update the value, alternatively >> creating a new Key, but pointing to the same Value? I effectively want a >> "putIfAbsent()" >> >> 3. The Scanner can fetch multiple rows, and constrained by CF and >> qualifier. I think that is quite clear. But what does the iterator() >> actually return? I presume that it is many key/value paris, of ALL >> timestamped values. But what is the order guarantees here? I get the >> impression that within a row->cf->qualifier, the returned values are in >> timestamp order, newest first. And I think that within a row, I am >> guaranteed that the order maintained, i.e. row -> cf -> qualifier (all >> ascending). But am I also guaranteed that the iterator is "done" with a row >> when the has changed? Or can rows be interleaved in the iterator? >> >> Thanks in advance >> Niclas >> >
