Re: Combiner behaviour

2014-03-19 Thread Russ Weeks
Hi, Josh, Thanks for walking me through this. This is my first stab at it: public class RowSummingCombiner extends WrappingIterator { Key lastKey; long sum; public Key getTopKey() { if (lastKey == null) return super.getTopKey(); return lastKey; } public Value getTopValu

Re: Filters and ScannerBase.fetchColumn

2014-03-19 Thread Russ Weeks
Thanks for clearing that up for me. -Russ On Wed, Mar 19, 2014 at 3:46 PM, Mike Drob wrote: > > Yes, you are running into the same issue described in > https://issues.apache.org/jira/browse/ACCUMULO-1801 > > > On Wed, Mar 19, 2014 at 6:41 PM, John Vines wrote: > >> Yes, column level filtering

Re: Filters and ScannerBase.fetchColumn

2014-03-19 Thread Billie Rinaldi
On Wed, Mar 19, 2014 at 3:36 PM, Russ Weeks wrote: > Sorry for the flood of e-mails... I'm not trying to spam the list, I'm > just getting deeper into accumulo, and loving it, and I'm kind of stumped > by it at the same time. > > Is it true that if a scanner restricts the column families/qualifier

Re: Combiner behaviour

2014-03-19 Thread Josh Elser
Ummm, you got the gist of it (I may have misspoke in what I initially said). What my first thought was to make an iterator that will filter down to the columns that you want. It doesn't look like we have an iterator that will efficiently do this for you included in the core (although, I know I

Re: Filters and ScannerBase.fetchColumn

2014-03-19 Thread Mike Drob
Yes, you are running into the same issue described in https://issues.apache.org/jira/browse/ACCUMULO-1801 On Wed, Mar 19, 2014 at 6:41 PM, John Vines wrote: > Yes, column level filtering happens before any client iterators get a > chance to touch the results. > > > On Wed, Mar 19, 2014 at 6:36

Re: Combiner behaviour

2014-03-19 Thread John Vines
Be careful when changing row values, especially outside of the tablet range, as I believe it can cause the data to be dropped or rejected. On Wed, Mar 19, 2014 at 6:40 PM, Russ Weeks wrote: > Hi, Josh, > > Thanks very much for your response. I think I get what you're saying, but > it's kind of b

Re: Filters and ScannerBase.fetchColumn

2014-03-19 Thread John Vines
Yes, column level filtering happens before any client iterators get a chance to touch the results. On Wed, Mar 19, 2014 at 6:36 PM, Russ Weeks wrote: > Sorry for the flood of e-mails... I'm not trying to spam the list, I'm > just getting deeper into accumulo, and loving it, and I'm kind of stump

Re: Combiner behaviour

2014-03-19 Thread Russ Weeks
Hi, Josh, Thanks very much for your response. I think I get what you're saying, but it's kind of blowing my mind. Are you saying that if I first set up an iterator that took my key/value pairs like, 00021ccaac30 meta:size []1807 00021ccaac30 meta:source []data2 00021cdaac30 m

Filters and ScannerBase.fetchColumn

2014-03-19 Thread Russ Weeks
Sorry for the flood of e-mails... I'm not trying to spam the list, I'm just getting deeper into accumulo, and loving it, and I'm kind of stumped by it at the same time. Is it true that if a scanner restricts the column families/qualifiers to be returned, that these columns are not visible to any i

Re: Combiner behaviour

2014-03-19 Thread Josh Elser
Russ, Remember about the distribution of data across multiple nodes in your cluster by tablet. A tablet, at the very minimum, will contain one row. Any way to say that same thing is that a row will never be split across multiple tablets. The only guarantee you get from Accumulo here is that

Combiner behaviour

2014-03-19 Thread Russ Weeks
The accumulo manual states that combiners can be applied to values which share the same rowID, column family, and column qualifier. Is there any way to adjust this behaviour? I have rows that look like, 00021ccaac30 meta:size []1807 00021ccaac30 meta:source []data2 00021cdaac30

Re: "NOT" operator in visibility string

2014-03-19 Thread David Medinets
Did there seem to be any performance implications to the patch? On Wed, Mar 19, 2014 at 5:39 PM, Christopher wrote: > Yes, dsingley. There's a few reasons not to accept the patch as-is: > > 1. It completely changes the data security model without offering a > way to disable it. For instance, we

Re: "NOT" operator in visibility string

2014-03-19 Thread Christopher
Yes, dsingley. There's a few reasons not to accept the patch as-is: 1. It completely changes the data security model without offering a way to disable it. For instance, we could have previously assumed that an authorizations service, such as that implemented in ACCUMULO-259 could safely fail with

Re: "NOT" operator in visibility string

2014-03-19 Thread John Vines
If you have full control of access to Accumulo, then it's just as easy to remove that visibility from the incoming set. On Wed, Mar 19, 2014 at 11:22 AM, Jeff Kunkle wrote: > My particular use case meets both of those conditions. I'd like to use a > not operator to soft delete things for specif

Re: "NOT" operator in visibility string

2014-03-19 Thread dsingley
Assuming Jeff's use case is legitimate and others users can ignore the NOT feature, is there any other reason not to accept Joe's patch? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/NOT-operator-in-visibility-string-tp7949p8288.html Sent from the Users mailing l

Re: "NOT" operator in visibility string

2014-03-19 Thread Sean Busbey
I don't see how NOT helps this use case. From what I've heard so far, we're still talking about a positive assertion (someone in the sandbox "group1" flagged the data as to be hidden) and then restricting who has access to data with that positive assertion (by default excluding everyone using the s

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Sean Busbey
On Wed, Mar 19, 2014 at 1:32 PM, Benjamin Parrish < benjamin.d.parr...@gmail.com> wrote: > Finally got it. Came down to a user and ownership issue with my Hadoop, > ZooKeeper, Accumulo. Does anyone have a Knowledgebase for this info that > lays out a standard of what users should be created, whe

Re: "NOT" operator in visibility string

2014-03-19 Thread Christopher
It sounds like you'd get some of your requirements to hide data by simply cloning a table to create a sandbox, in which one can issue actual deletes to remove it from that sandbox's view. Accumulo's clone feature will not duplicate data unnecessarily, so you could have many clones, each with differ

Re: Deep-copying RowFilter

2014-03-19 Thread Keith Turner
On Wed, Mar 19, 2014 at 12:35 PM, Russ Weeks wrote: > Hi, Keith, > > Thanks for your response. I opened ACCUMULO-2501 and included a patch > that works for me. > > The problem is that RowFilter calls deepCopy when it sets up its internal > decisionIterator, so if the source iterator is also a Row

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Josh Elser
That's definitely difficult to encapsulate. You can get a mixing of Apache conventions, your OS of choice conventions, and vendor conventions. The thing that I usually see is applications installed under /usr/lib (e.g. /usr/lib/accumulo), the relevant executables linked to /usr/bin to get them

Re: Deep-copying RowFilter

2014-03-19 Thread Russ Weeks
Hi, Keith, Thanks for your response. I opened ACCUMULO-2501 and included a patch that works for me. The problem is that RowFilter calls deepCopy when it sets up its internal decisionIterator, so if the source iterator is also a RowFilter I'm hooped. -Russ On Wed, Mar 19, 2014 at 9:24 AM, Keit

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Benjamin Parrish
Finally got it. Came down to a user and ownership issue with my Hadoop, ZooKeeper, Accumulo. Does anyone have a Knowledgebase for this info that lays out a standard of what users should be created, where folders should be created, permissions, ownership, etc. I feel like that would be invaluable

Re: "NOT" operator in visibility string

2014-03-19 Thread Sean Busbey
On Wed, Mar 19, 2014 at 10:22 AM, Jeff Kunkle wrote: > My particular use case meets both of those conditions. I’d like to use a > not operator to soft delete things for specific groups of users, which are > assigned a given authorization. For example, assume I have two groups of > users: group1 a

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
The sandboxes are really just sharing pointers to data. Users might only see a subset of that data depending on their authorizations. On Mar 19, 2014, at 2:09 PM, David Medinets wrote: > Is data shared between sandboxes? Could namespaces proxy for sandboxes? > > > On Wed, Mar 19, 2014 at 1:46

Re: "NOT" operator in visibility string

2014-03-19 Thread David Medinets
Is data shared between sandboxes? Could namespaces proxy for sandboxes? On Wed, Mar 19, 2014 at 1:46 PM, Mike Drob wrote: > Thanks, that's really helpful. Couple more questions. > > Is a sandbox the same thing as a workspace? Can the terms be used > interchangeably? Just want to make sure I'm n

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
> Is a sandbox the same thing as a workspace? Can the terms be used > interchangeably? Just want to make sure I'm not misinterpreting your answers. Yes. Sorry I wasn’t consistent with the terminology. > Is it fair to describe each sandbox as a separate index table for the global > data set? An

Re: "NOT" operator in visibility string

2014-03-19 Thread Josh Elser
It kind of sounds like you could manage this much easier by controlling the authorizations a user gets (notably the workspace name) and the grant/revoke above the Accumulo level. A sandbox has a unique label and the external system controls which users are granted that label. This way, each sa

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
My particular use case meets both of those conditions. I’d like to use a not operator to soft delete things for specific groups of users, which are assigned a given authorization. For example, assume I have two groups of users: group1 and group2. If I want to temporarily hide something from grou

Re: "NOT" operator in visibility string

2014-03-19 Thread Mike Drob
Thanks, that's really helpful. Couple more questions. Is a sandbox the same thing as a workspace? Can the terms be used interchangeably? Just want to make sure I'm not misinterpreting your answers. Is it fair to describe each sandbox as a separate index table for the global data set? And then whe

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
> You have a large amount of data, that is generally readable by all users. Not necessarily. All data has some visibility constraint that a users authorization's may or may not satisfy. > Users create their own sandbox, from which they can later exclude portions of > the global data set. Yes,

Re: "NOT" operator in visibility string

2014-03-19 Thread Mike Drob
Wait, I'm really confused by what you are describing, Jeff. Sorry if these are obvious questions, but can you help me get a better grasp of your use case? You have a large amount of data, that is generally readable by all users. Users create their own sandbox, from which they can later exclude por

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
Hi John, Yes it’s accurate that the system controls the label and who is associated with it; there are no Accumulo-internal user accounts. But I don’t think it’s feasible to remove a sandbox label from something that should be hidden. Such a scenario would imply that all data is “tagged” with t

Re: Deep-copying RowFilter

2014-03-19 Thread Keith Turner
On Tue, Mar 18, 2014 at 6:38 PM, Russ Weeks wrote: > Hi, > > org.apache.accumulo.core.iterators.user.RowFilter doesn't have a deepCopy > method, which seems to mean that I can't chain multiple RowFilters together. > > Looking at some examples (GrepIterator, SortedKeyIterator) it seems pretty > ea

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
I attempted to simplify the scenario to facilitate discussion, which on second thought may have been a mistake. Here’s the whole scenario: Different users have access to different subsets of the data depending on their authorizations and the visibility of the data. Users “work with” the data in

Re: "NOT" operator in visibility string

2014-03-19 Thread Sean Busbey
On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle wrote: > New groups are created on the fly by our application when needed. Under > the scenario you describe we’d have to go through all the data in Accumulo > whenever a group is created so that users in the group can see the existing > data. > > >

Re: "NOT" operator in visibility string

2014-03-19 Thread Jeff Kunkle
New groups are created on the fly by our application when needed. Under the scenario you describe we’d have to go through all the data in Accumulo whenever a group is created so that users in the group can see the existing data. On Mar 19, 2014, at 11:34 AM, Sean Busbey wrote: > > On Wed, Ma

Re: "NOT" operator in visibility string

2014-03-19 Thread Christopher
I think you're looking at the design of visibility labels backwards. Visibility labels and corresponding authorizations are not user groups, for which you assign data to, they represent attributes of the data itself, which determine which groups can access it. If you have a new group, in Accumulo t

Re: "NOT" operator in visibility string

2014-03-19 Thread Christopher
What you are describing is "dotfile behavior": that is, ignoring files that begin with '.' from a directory listing, by default, but not actually protecting them from being visible if a user really wants them to be. It seems odd to me that this use case should be attempted to be satisfied by alteri

Is Anyone Using Symbolic Aggregate approXimation (SAX) With Accumulo?

2014-03-19 Thread David Medinets
http://www.cs.ucr.edu/~eamonn/SAX.htm - it seems like this approach to Time Series Analysis would be a natural fit for Accumulo's iterators.

Re: "NOT" operator in visibility string

2014-03-19 Thread Sean Busbey
On Wed, Mar 19, 2014 at 9:36 AM, kunklejr wrote: > So is there any consensus on whether this should be included? I would use > it > right away on a current project if it were. I understand the security risks > that have been discussed with having a NOT operator, but I see its use as a > decision

Re: "NOT" operator in visibility string

2014-03-19 Thread kunklejr
So is there any consensus on whether this should be included? I would use it right away on a current project if it were. I understand the security risks that have been discussed with having a NOT operator, but I see its use as a decision to be made by the development team. If the project deems use

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Sean Busbey
Also, on the off chance that some other part of your system is exporting an incorrect HADOOP_CONF_DIR, you should still run this confirmation step from earlier: > You can verify this by doing > > ssh ${HOST} "bash -c 'echo ${HADOOP_CONF_DIR:-no hadoop conf}'" > > as the accumulo user on the mast

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Sean Busbey
Josh is correct about the behavior of the ZooKeeper cli. As an aside, how big is this cluster? Five ZooKeeper servers shouldn't be needed until you get past ~100 nodes, unless you're just going for more fault tolerance. Could you update your gist with the changes to accumulo-env.sh? It's much easi

RE: Installing with Hadoop 2.2.0

2014-03-19 Thread Ott, Charlie H.
Benjamin, It may be better to step back for a second and make sure you have the Hadoop environment set up correctly. You are very close but it seems like there is just an issue with the Accumulo classpath or your environment variables. In regard to ensuring zookeeper is working, you can use th

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Benjamin Parrish
So, I am back to no clue now... On Wed, Mar 19, 2014 at 9:13 AM, Josh Elser wrote: > I think by default zkCli.sh will just try to connect to localhost. You can > change this by providing the quorum string to the script with the -server > option. > On Mar 19, 2014 8:29 AM, "Benjamin Parrish" >

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Josh Elser
I think by default zkCli.sh will just try to connect to localhost. You can change this by providing the quorum string to the script with the -server option. On Mar 19, 2014 8:29 AM, "Benjamin Parrish" wrote: > I adjusted accumulo-env.sh to have hard coded values as seen below. > > Are there any l

Re: Installing with Hadoop 2.2.0

2014-03-19 Thread Benjamin Parrish
I adjusted accumulo-env.sh to have hard coded values as seen below. Are there any logs that could shed some light on this issue? If it also helps I am using CentOS 6.5, Hadoop 2.2.0, ZooKeeper 3.4.6. I also ran across this, that didn't look right... Welcome to ZooKeeper! 2014-03-19 08:25:53,479