Dylan,
If I recall correctly (which I give about 30% odds), the original purpose
of the side channel was to split up things like delete tombstone entries
from regular entries so that other iterators sitting on top of a
bifurcating iterator wouldn't have to handle the special tombstone
why you want to use a side channel instead of implementing the merge in
your own iterator
Here is a picture showing the difference--
Fig. A: Using a side channel to add a top-level iterator.
RfileIter1 RfileIter2 InjectIterator ...
| / /
|_/ /
o__*(3-way
top-level with respect to the side channel description is inverted with
respect to your diagram. Fig. A should be more like this:
RfileIter1 RfileIter2
| /
|_/
Merge
|
VersioningIterator
|
OtherIterators InjectIterator
| /
|__/
Merge
|
v
Thus,
Hello,
This is more of a use-case report and a request for comment.
I am using Accumulo as a source for Spark RDDs through AccumuloInputFormat. My
index is based on a z-order space filing curve. When I decompose a bounding
box into index ranges I can end up with a large number of Ranges, 3k+
Eugene,
First off, thanks so much for writing this up. This is definitely a hot
topic that comes up for users and appears to have a lot of relevance to
people right now.
I think the first thing that needs to happen is that we lift
TabletLocator (or some class which serves the purpose that
Couldn't we do this in the 1.6 line as an optimization when we meet the
constraints on scanners?
That would let us avoid exposing TabletLocator and get something out sooner.
--
Sean
On Feb 16, 2015 2:48 PM, Josh Elser josh.el...@gmail.com wrote:
Eugene,
First off, thanks so much for writing
Unless I misread things earlier, we wouldn't have a way to provide users
the means to control this in 1.6 and we'd be altering how the
implementation works drastically (BatchScanner instead of Scanner).
Adding anything new to make this work with a BatchScanner would be
disallowed for a 1.6.x
Hello,
I'm using Accumulo to store raw and value-added data and expose this
data to a small number of end users. During ingestion, the system will
connect to accumulo as a single accumulo user called, say, ingestor.
This user will first store data, and then later in the ingestion
pipeline read
I think part of your question pertains to the differences between ABAC
(attribute-based access controls) and RBAC (role-based access controls).
In both A1 and A2, you're thinking in terms of RBAC. The only real
differences is whether you want to have one additional role, or repurpose
the existing
Chris, thank you for expressing the problem in such succinct terms. My
problem does appear to be one of RBAC versus ABAC.
Josh, thanks for your observations.
I will try to summarize my updated understanding of the issue based on
your replies:
At its core, Accumulo appears to encourage ABAC by
I think A1 is ultimately the right thing, as well.
The problem is not that you don't know how to accurately label your data
(which is the biggest problem in Accumulo as updating the visibility is
very costly), it's that it's hard to be able to add your enrichment data
after the fact.
The
On Mon, Feb 16, 2015 at 7:26 PM, Srikanth Viswanathan srikant...@gmail.com
wrote:
[snip]
I will try to summarize my updated understanding of the issue based on
your replies:
[snip]
Do let me know if my observation above has any inaccuracies.
Seems reasonable.
As a side note, it did help
If you can do a merge sort insertion, then you can guarantee order and
it's fine.
Yep, I guarantee the iterator we add as a side channel will emit tuples in
sorted order.
On a suggestion from David Medinets, I modified my testing code to use a
MiniAccumuloCluster set to 2 tablet servers. I
13 matches
Mail list logo