Hi, Josh,
Thanks for walking me through this. This is my first stab at it:
public class RowSummingCombiner extends WrappingIterator {
Key lastKey;
long sum;
public Key getTopKey() {
if (lastKey == null)
return super.getTopKey();
return lastKey;
}
public Value getTopValu
Thanks for clearing that up for me.
-Russ
On Wed, Mar 19, 2014 at 3:46 PM, Mike Drob wrote:
>
> Yes, you are running into the same issue described in
> https://issues.apache.org/jira/browse/ACCUMULO-1801
>
>
> On Wed, Mar 19, 2014 at 6:41 PM, John Vines wrote:
>
>> Yes, column level filtering
On Wed, Mar 19, 2014 at 3:36 PM, Russ Weeks wrote:
> Sorry for the flood of e-mails... I'm not trying to spam the list, I'm
> just getting deeper into accumulo, and loving it, and I'm kind of stumped
> by it at the same time.
>
> Is it true that if a scanner restricts the column families/qualifier
Ummm, you got the gist of it (I may have misspoke in what I initially said).
What my first thought was to make an iterator that will filter down to
the columns that you want. It doesn't look like we have an iterator that
will efficiently do this for you included in the core (although, I know
I
Yes, you are running into the same issue described in
https://issues.apache.org/jira/browse/ACCUMULO-1801
On Wed, Mar 19, 2014 at 6:41 PM, John Vines wrote:
> Yes, column level filtering happens before any client iterators get a
> chance to touch the results.
>
>
> On Wed, Mar 19, 2014 at 6:36
Be careful when changing row values, especially outside of the tablet
range, as I believe it can cause the data to be dropped or rejected.
On Wed, Mar 19, 2014 at 6:40 PM, Russ Weeks wrote:
> Hi, Josh,
>
> Thanks very much for your response. I think I get what you're saying, but
> it's kind of b
Yes, column level filtering happens before any client iterators get a
chance to touch the results.
On Wed, Mar 19, 2014 at 6:36 PM, Russ Weeks wrote:
> Sorry for the flood of e-mails... I'm not trying to spam the list, I'm
> just getting deeper into accumulo, and loving it, and I'm kind of stump
Hi, Josh,
Thanks very much for your response. I think I get what you're saying, but
it's kind of blowing my mind.
Are you saying that if I first set up an iterator that took my key/value
pairs like,
00021ccaac30 meta:size []1807
00021ccaac30 meta:source []data2
00021cdaac30 m
Sorry for the flood of e-mails... I'm not trying to spam the list, I'm just
getting deeper into accumulo, and loving it, and I'm kind of stumped by it
at the same time.
Is it true that if a scanner restricts the column families/qualifiers to be
returned, that these columns are not visible to any i
Russ,
Remember about the distribution of data across multiple nodes in your
cluster by tablet.
A tablet, at the very minimum, will contain one row. Any way to say that
same thing is that a row will never be split across multiple tablets.
The only guarantee you get from Accumulo here is that
The accumulo manual states that combiners can be applied to values which
share the same rowID, column family, and column qualifier. Is there any way
to adjust this behaviour? I have rows that look like,
00021ccaac30 meta:size []1807
00021ccaac30 meta:source []data2
00021cdaac30
Did there seem to be any performance implications to the patch?
On Wed, Mar 19, 2014 at 5:39 PM, Christopher wrote:
> Yes, dsingley. There's a few reasons not to accept the patch as-is:
>
> 1. It completely changes the data security model without offering a
> way to disable it. For instance, we
Yes, dsingley. There's a few reasons not to accept the patch as-is:
1. It completely changes the data security model without offering a
way to disable it. For instance, we could have previously assumed that
an authorizations service, such as that implemented in ACCUMULO-259
could safely fail with
If you have full control of access to Accumulo, then it's just as easy to
remove that visibility from the incoming set.
On Wed, Mar 19, 2014 at 11:22 AM, Jeff Kunkle wrote:
> My particular use case meets both of those conditions. I'd like to use a
> not operator to soft delete things for specif
Assuming Jeff's use case is legitimate and others users can ignore the NOT
feature, is there any other reason not to accept Joe's patch?
--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/NOT-operator-in-visibility-string-tp7949p8288.html
Sent from the Users mailing l
I don't see how NOT helps this use case. From what I've heard so far, we're
still talking about a positive assertion (someone in the sandbox "group1"
flagged the data as to be hidden) and then restricting who has access to
data with that positive assertion (by default excluding everyone using the
s
On Wed, Mar 19, 2014 at 1:32 PM, Benjamin Parrish <
benjamin.d.parr...@gmail.com> wrote:
> Finally got it. Came down to a user and ownership issue with my Hadoop,
> ZooKeeper, Accumulo. Does anyone have a Knowledgebase for this info that
> lays out a standard of what users should be created, whe
It sounds like you'd get some of your requirements to hide data by
simply cloning a table to create a sandbox, in which one can issue
actual deletes to remove it from that sandbox's view. Accumulo's clone
feature will not duplicate data unnecessarily, so you could have many
clones, each with differ
On Wed, Mar 19, 2014 at 12:35 PM, Russ Weeks wrote:
> Hi, Keith,
>
> Thanks for your response. I opened ACCUMULO-2501 and included a patch
> that works for me.
>
> The problem is that RowFilter calls deepCopy when it sets up its internal
> decisionIterator, so if the source iterator is also a Row
That's definitely difficult to encapsulate. You can get a mixing of
Apache conventions, your OS of choice conventions, and vendor conventions.
The thing that I usually see is applications installed under /usr/lib
(e.g. /usr/lib/accumulo), the relevant executables linked to /usr/bin to
get them
Hi, Keith,
Thanks for your response. I opened ACCUMULO-2501 and included a patch that
works for me.
The problem is that RowFilter calls deepCopy when it sets up its internal
decisionIterator, so if the source iterator is also a RowFilter I'm hooped.
-Russ
On Wed, Mar 19, 2014 at 9:24 AM, Keit
Finally got it. Came down to a user and ownership issue with my Hadoop,
ZooKeeper, Accumulo. Does anyone have a Knowledgebase for this info that
lays out a standard of what users should be created, where folders should
be created, permissions, ownership, etc. I feel like that would be
invaluable
On Wed, Mar 19, 2014 at 10:22 AM, Jeff Kunkle wrote:
> My particular use case meets both of those conditions. I’d like to use a
> not operator to soft delete things for specific groups of users, which are
> assigned a given authorization. For example, assume I have two groups of
> users: group1 a
The sandboxes are really just sharing pointers to data. Users might only see a
subset of that data depending on their authorizations.
On Mar 19, 2014, at 2:09 PM, David Medinets wrote:
> Is data shared between sandboxes? Could namespaces proxy for sandboxes?
>
>
> On Wed, Mar 19, 2014 at 1:46
Is data shared between sandboxes? Could namespaces proxy for sandboxes?
On Wed, Mar 19, 2014 at 1:46 PM, Mike Drob wrote:
> Thanks, that's really helpful. Couple more questions.
>
> Is a sandbox the same thing as a workspace? Can the terms be used
> interchangeably? Just want to make sure I'm n
> Is a sandbox the same thing as a workspace? Can the terms be used
> interchangeably? Just want to make sure I'm not misinterpreting your answers.
Yes. Sorry I wasn’t consistent with the terminology.
> Is it fair to describe each sandbox as a separate index table for the global
> data set? An
It kind of sounds like you could manage this much easier by controlling
the authorizations a user gets (notably the workspace name) and the
grant/revoke above the Accumulo level.
A sandbox has a unique label and the external system controls which
users are granted that label. This way, each sa
My particular use case meets both of those conditions. I’d like to use a not
operator to soft delete things for specific groups of users, which are assigned
a given authorization. For example, assume I have two groups of users: group1
and group2. If I want to temporarily hide something from grou
Thanks, that's really helpful. Couple more questions.
Is a sandbox the same thing as a workspace? Can the terms be used
interchangeably? Just want to make sure I'm not misinterpreting your
answers.
Is it fair to describe each sandbox as a separate index table for the
global data set? And then whe
> You have a large amount of data, that is generally readable by all users.
Not necessarily. All data has some visibility constraint that a users
authorization's may or may not satisfy.
> Users create their own sandbox, from which they can later exclude portions of
> the global data set.
Yes,
Wait, I'm really confused by what you are describing, Jeff. Sorry if these
are obvious questions, but can you help me get a better grasp of your use
case?
You have a large amount of data, that is generally readable by all users.
Users create their own sandbox, from which they can later exclude por
Hi John,
Yes it’s accurate that the system controls the label and who is associated with
it; there are no Accumulo-internal user accounts. But I don’t think it’s
feasible to remove a sandbox label from something that should be hidden. Such a
scenario would imply that all data is “tagged” with t
On Tue, Mar 18, 2014 at 6:38 PM, Russ Weeks wrote:
> Hi,
>
> org.apache.accumulo.core.iterators.user.RowFilter doesn't have a deepCopy
> method, which seems to mean that I can't chain multiple RowFilters together.
>
> Looking at some examples (GrepIterator, SortedKeyIterator) it seems pretty
> ea
I attempted to simplify the scenario to facilitate discussion, which on second
thought may have been a mistake. Here’s the whole scenario:
Different users have access to different subsets of the data depending on their
authorizations and the visibility of the data. Users “work with” the data in
On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle wrote:
> New groups are created on the fly by our application when needed. Under
> the scenario you describe we’d have to go through all the data in Accumulo
> whenever a group is created so that users in the group can see the existing
> data.
>
>
>
New groups are created on the fly by our application when needed. Under the
scenario you describe we’d have to go through all the data in Accumulo whenever
a group is created so that users in the group can see the existing data.
On Mar 19, 2014, at 11:34 AM, Sean Busbey wrote:
>
> On Wed, Ma
I think you're looking at the design of visibility labels backwards.
Visibility labels and corresponding authorizations are not user
groups, for which you assign data to, they represent attributes of the
data itself, which determine which groups can access it. If you have a
new group, in Accumulo t
What you are describing is "dotfile behavior": that is, ignoring files
that begin with '.' from a directory listing, by default, but not
actually protecting them from being visible if a user really wants
them to be. It seems odd to me that this use case should be attempted
to be satisfied by alteri
http://www.cs.ucr.edu/~eamonn/SAX.htm - it seems like this approach to Time
Series Analysis would be a natural fit for Accumulo's iterators.
On Wed, Mar 19, 2014 at 9:36 AM, kunklejr wrote:
> So is there any consensus on whether this should be included? I would use
> it
> right away on a current project if it were. I understand the security risks
> that have been discussed with having a NOT operator, but I see its use as a
> decision
So is there any consensus on whether this should be included? I would use it
right away on a current project if it were. I understand the security risks
that have been discussed with having a NOT operator, but I see its use as a
decision to be made by the development team. If the project deems use
Also, on the off chance that some other part of your system is exporting an
incorrect HADOOP_CONF_DIR, you should still run this confirmation step from
earlier:
> You can verify this by doing
>
> ssh ${HOST} "bash -c 'echo ${HADOOP_CONF_DIR:-no hadoop conf}'"
>
> as the accumulo user on the mast
Josh is correct about the behavior of the ZooKeeper cli. As an aside, how
big is this cluster? Five ZooKeeper servers shouldn't be needed until you
get past ~100 nodes, unless you're just going for more fault tolerance.
Could you update your gist with the changes to accumulo-env.sh? It's much
easi
Benjamin,
It may be better to step back for a second and make sure you have the Hadoop
environment set up correctly. You are very close but it seems like there is
just an issue with the Accumulo classpath or your environment variables.
In regard to ensuring zookeeper is working, you can use th
So, I am back to no clue now...
On Wed, Mar 19, 2014 at 9:13 AM, Josh Elser wrote:
> I think by default zkCli.sh will just try to connect to localhost. You can
> change this by providing the quorum string to the script with the -server
> option.
> On Mar 19, 2014 8:29 AM, "Benjamin Parrish"
>
I think by default zkCli.sh will just try to connect to localhost. You can
change this by providing the quorum string to the script with the -server
option.
On Mar 19, 2014 8:29 AM, "Benjamin Parrish"
wrote:
> I adjusted accumulo-env.sh to have hard coded values as seen below.
>
> Are there any l
I adjusted accumulo-env.sh to have hard coded values as seen below.
Are there any logs that could shed some light on this issue?
If it also helps I am using CentOS 6.5, Hadoop 2.2.0, ZooKeeper 3.4.6.
I also ran across this, that didn't look right...
Welcome to ZooKeeper!
2014-03-19 08:25:53,479
47 matches
Mail list logo