Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Adam Fuchs
Dylan, If I recall correctly (which I give about 30% odds), the original purpose of the side channel was to split up things like delete tombstone entries from regular entries so that other iterators sitting on top of a bifurcating iterator wouldn't have to handle the special tombstone

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Dylan Hutchison
why you want to use a side channel instead of implementing the merge in your own iterator Here is a picture showing the difference-- Fig. A: Using a side channel to add a top-level iterator. RfileIter1 RfileIter2 InjectIterator ... | / / |_/ / o__*(3-way

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Adam Fuchs
top-level with respect to the side channel description is inverted with respect to your diagram. Fig. A should be more like this: RfileIter1 RfileIter2 | / |_/ Merge | VersioningIterator | OtherIterators InjectIterator | / |__/ Merge | v Thus,

AccumuloInputFormat and spark

2015-02-16 Thread Eugene Cheipesh
Hello, This is more of a use-case report and a request for comment. I am using Accumulo as a source for Spark RDDs through AccumuloInputFormat. My index is based on a z-order space filing curve. When I decompose  a bounding box into index ranges I can end up with a large number of Ranges, 3k+

Re: AccumuloInputFormat and spark

2015-02-16 Thread Josh Elser
Eugene, First off, thanks so much for writing this up. This is definitely a hot topic that comes up for users and appears to have a lot of relevance to people right now. I think the first thing that needs to happen is that we lift TabletLocator (or some class which serves the purpose that

Re: AccumuloInputFormat and spark

2015-02-16 Thread Sean Busbey
Couldn't we do this in the 1.6 line as an optimization when we meet the constraints on scanners? That would let us avoid exposing TabletLocator and get something out sooner. -- Sean On Feb 16, 2015 2:48 PM, Josh Elser josh.el...@gmail.com wrote: Eugene, First off, thanks so much for writing

Re: AccumuloInputFormat and spark

2015-02-16 Thread Josh Elser
Unless I misread things earlier, we wouldn't have a way to provide users the means to control this in 1.6 and we'd be altering how the implementation works drastically (BatchScanner instead of Scanner). Adding anything new to make this work with a BatchScanner would be disallowed for a 1.6.x

User authorizations in accumulo

2015-02-16 Thread Srikanth Viswanathan
Hello, I'm using Accumulo to store raw and value-added data and expose this data to a small number of end users. During ingestion, the system will connect to accumulo as a single accumulo user called, say, ingestor. This user will first store data, and then later in the ingestion pipeline read

Re: User authorizations in accumulo

2015-02-16 Thread Christopher
I think part of your question pertains to the differences between ABAC (attribute-based access controls) and RBAC (role-based access controls). In both A1 and A2, you're thinking in terms of RBAC. The only real differences is whether you want to have one additional role, or repurpose the existing

Re: User authorizations in accumulo

2015-02-16 Thread Srikanth Viswanathan
Chris, thank you for expressing the problem in such succinct terms. My problem does appear to be one of RBAC versus ABAC. Josh, thanks for your observations. I will try to summarize my updated understanding of the issue based on your replies: At its core, Accumulo appears to encourage ABAC by

Re: User authorizations in accumulo

2015-02-16 Thread Josh Elser
I think A1 is ultimately the right thing, as well. The problem is not that you don't know how to accurately label your data (which is the biggest problem in Accumulo as updating the visibility is very costly), it's that it's hard to be able to add your enrichment data after the fact. The

Re: User authorizations in accumulo

2015-02-16 Thread Christopher
On Mon, Feb 16, 2015 at 7:26 PM, Srikanth Viswanathan srikant...@gmail.com wrote: [snip] I will try to summarize my updated understanding of the issue based on your replies: [snip] Do let me know if my observation above has any inaccuracies. Seems reasonable. As a side note, it did help

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Dylan Hutchison
If you can do a merge sort insertion, then you can guarantee order and it's fine. Yep, I guarantee the iterator we add as a side channel will emit tuples in sorted order. On a suggestion from David Medinets, I modified my testing code to use a MiniAccumuloCluster set to 2 tablet servers. I