Re: D4M related projects

2018-09-29 Thread Dylan Hutchison
Hi folks, I opened a PR adding an entry to the related-projects page at https://github.com/apache/accumulo-website/pull/113 Cheers, Dylan On Sat, Sep 29, 2018 at 10:47 AM Kepner, Jeremy - LLSC - MITLL < kep...@ll.mit.edu> wrote: > Accumulo Developers, > We hoping a link to our D4M software en

Re: [VOTE] Switch to GitHub issues

2018-03-13 Thread Dylan Hutchison
+1 On Tue, Mar 13, 2018 at 9:53 AM, Keith Turner wrote: > Accumulo PMC, > > Please vote on initiating the transition from JIRA to GitHub. The > purpose of this vote is to see if there is agreement on the following > three items the community discussed. > > * Using this workflow initially : >

Re: [DISCUSS] Moving away from Thrift

2017-11-16 Thread Dylan Hutchison
I don't know if it is mature yet, but Apache Arrow has an RPC format and bindings for several languages. See slide 37+ here: https://www.slideshare.net/julienledem/the-columnar-roadmap-apache-parquet-and-apache-arrow Arrow's RPC would be particularly advantageous if the Arrow format were used t

Re: New committer/PMC: Ivan Bella

2017-07-13 Thread Dylan Hutchison
Yes, Ivan, welcome to the family you already have :) On Thu, Jul 13, 2017 at 2:25 PM, Marc wrote: > Congrats! > > On Thu, Jul 13, 2017 at 5:23 PM, wrote: > > > Congrats Ivan! > > > > -Original Message- > > From: Christopher [mailto:ctubb...@apache.org] > > Sent: Thursday, July 13, 2017

Re: Sorted RowId suffix retrieval using Server Side Iterators

2017-07-10 Thread Dylan Hutchison
You might be able to take a batched approach, using server-side iterators to gather as many S's from POS rows as possible at each tablet server up to a memory budget, and then querying the SPO table from inside those iterators. (With some caution to be mindful of tablet server thread limits, you c

Re: Sorted RowId suffix retrieval using Server Side Iterators

2017-07-06 Thread Dylan Hutchison
Let's see if I understand your question. The queries are range queries on P over the POS table. Within each range, you would like to sort the S values (a suffix of the Key) retrieved. Are your range queries *small enough to fit in memory*? If so, you could gather all the entries in the range to

Re: How to remove all tables of accumulo or format hadoop files for accumulo

2017-04-28 Thread Dylan Hutchison
For deleting test tables, you may find the flag "-p" for the deletetable shell command useful. For example, deletetable -f -p Test.* will delete all tables that begin with the prefix "Test". You could even call this from a script if you want to automate it into your build. Perhaps this would s

Re: [VOTE] 1.8.1-rc1

2017-02-24 Thread Dylan Hutchison
+1 * Verified hashes and signatures on src * Installed from src on local machine * Graphulo tests pass On Thu, Feb 23, 2017 at 8:16 AM, Mike Walch wrote: > +1 > > * Verified hashes and signatures > * Ran local cluster using binary release > > On Tue, Feb 21, 2017 at 2:05 PM Michael Wall wrote

Re: Running Accumulo on a standard file system, without Hadoop

2017-01-16 Thread Dylan Hutchison
hat is >> HDFS client compatible (something like Quantcast QFS[1], setting up client >> [2]). If Accumulo is using something in Hadoop outside of the public client >> API, this may not work. >> >> [1] https://github.com/quantcast/qfs >> [2] https://g

Running Accumulo on a standard file system, without Hadoop

2017-01-16 Thread Dylan Hutchison
Hi folks, A friend of mine asked about running Accumulo on a normal file system in place of Hadoop, similar to the way MiniAccumulo runs. How possible is this, or how much work would it take to do so? I think my friend is just interested in running on a single node, but I am curious about both t

Re: [DISCUSS] Would a visibility histogram on a table be harmful?

2016-10-11 Thread Dylan Hutchison
Interesting idea. It begs the question: should we allow any custom index at the RFile level? If RFile indexes were user-extensible, then a visibility index would be something any developer could write. That said, we can still include such an index as an example, and if we did it could be used by

Re: Orphaned MAC processes

2016-08-31 Thread Dylan Hutchison
I have noticed it many times in the past. I think they popped up when I launched a test that crashed the tserver with an exception. When I was doing development this happened often enough that I made a script to do grep and kill them. Under normal circumstances they die as normal. On Wed, Aug 3

Re: New Committers/PMC members!

2016-08-31 Thread Dylan Hutchison
Welcome Marc! Warm reWelcome to Michael! On Wed, Aug 31, 2016 at 7:58 AM, Josh Elser wrote: > Hiya folks, > > I wanted to take a moment to publicly announce some recent additions to > the Apache Accumulo family (committers and PMC). > > We had Mike Wall join the ranks back in April (sorry for t

Re: [VOTE] Accumulo 1.8.0-rc3

2016-08-30 Thread Dylan Hutchison
(0) Sadly I am unable to help test since my linux machine is out for repair. The following checked out on rc2: * Unit tests pass. * Good checksums and sigs. Fingerprint matches Mike's key. * Graphulo tests pass. On Mon, Aug 29, 2016 at 11:45 AM, Michael Wall wrote: > Accumulo Developers, > > P

Re: [VOTE] Plan for next release

2016-08-22 Thread Dylan Hutchison
-1, primarily with the spirit of "Release Early, Release Often ". Let's get our changes out there, rather than wait another month to add in Java 8. My vote is not religious, however, and it could change if someone argues for some incre

Re: Custom Java SecurityManager permissions

2016-08-15 Thread Dylan Hutchison
Maybe related to ACCUMULO-1188 ? On Mon, Aug 15, 2016 at 10:09 AM, Josh Elser wrote: > +1 from me. > > IIRC, they used to be something to try to guard against user JARs > (containing iterators) doing something malicious, but obviously they > h

Re: Dealing with FastBulkImportIT

2016-08-13 Thread Dylan Hutchison
sing changes. Developers are free to run them more often. On Sat, Aug 13, 2016 at 3:33 PM, Dylan Hutchison wrote: > ACCUMULO-3327 <https://issues.apache.org/jira/browse/ACCUMULO-3327> is a > perfect example of a performance bug. The tablet servers used to reload > the bulk imported fl

Re: Dealing with FastBulkImportIT

2016-08-13 Thread Dylan Hutchison
ate such tests and make > them runnable via Maven, removing them from normal execution. I think that > would be an acceptable way forward. > > However, that would leave us with no end-to-end test for ACCUMULO-3327 > which isn't great.. > > > Dylan Hutchison wrote: >

Re: Dealing with FastBulkImportIT

2016-08-13 Thread Dylan Hutchison
Hi Josh, Forgive me for the design question, but shouldn't we distinguish tests of correctness from tests of performance? The following is my understanding of test categories, which does not totally align with Accumulo's test suite: * Unit tests test individual components. * Integration tests tes

Re: [DISCUSS] Remove directory assembly

2016-07-22 Thread Dylan Hutchison
I'm happy with extracting the tarball. Thanks for cleaning the builds Chris. On Jul 22, 2016 5:34 PM, "Christopher" wrote: > Hi all, > > Awhile back, when I was working on moving all our build artifacts to the > target directory during a maven build, I created a directory assembly, > which looke

Re: Writing to a table via an iterator

2016-07-18 Thread Dylan Hutchison
(Or, you can use follow the Graphulo library's approach. See the RemoteWriteIteartor class.) On Mon, Jul 18, 2016 at 12:39 PM, Josh Elser wrote: > While it may be possible, please do not do this. This is not a designed > feature of the Iterator framework and

Re: LocalityGroupDeleter

2016-07-15 Thread Dylan Hutchison
Ah, sorry Russ. Now that I returned to my laptop and looked at your code, I see you did pass the locality group column families via IteratorOptions. Nice! On Fri, Jul 15, 2016 at 2:21 PM, Dylan Hutchison wrote: > I see. Pass table name via IteratorOptions? Or just pass locality group >

Re: LocalityGroupDeleter

2016-07-15 Thread Dylan Hutchison
way for an iterator to know what > table it's operating on (which seems weird but...) > > On Fri, Jul 15, 2016 at 2:01 PM Dylan Hutchison < > dhutc...@cs.washington.edu> > wrote: > > > For accessing locality group column family information inside an Accumulo

Re: LocalityGroupDeleter

2016-07-15 Thread Dylan Hutchison
For accessing locality group column family information inside an Accumulo iterator, what about querying the Accumulo table property table.group.* from IteratorEnvironment.getConfig() in the iterator's init()? I have

Re: [DISCUSS] Jenkins notifications to dev list?

2016-07-14 Thread Dylan Hutchison
On the other hand, sending failed build notifications to the dev list motivates us to not break the tests and make the tests stable. I'll leave it to your decision Chris, unless others have an opinion. On Wed, Jul 13, 2016 at 8:54 PM, Christopher wrote: > On Wed, Jul 13, 2016 at 11:02

Re: [DISCUSS] Jenkins notifications to dev list?

2016-07-13 Thread Dylan Hutchison
On Wed, Jul 13, 2016 at 4:36 PM, Christopher wrote: > Okay, so after some investigation ( > https://issues.apache.org/jira/browse/INFRA-12252), it appears that > notifications@ is simply configured to block email from non apache > addresses. > > So, I have three possible solutions, if the Accumul

Re: [DISCUSS] Proposed binary packaging changes

2016-06-30 Thread Dylan Hutchison
Thanks for highlighting Chris. Some thoughts: There are steps defined in the README and INSTALL for setting up Accumulo from the binary distribution. It sounds reasonable to add additional steps for downloading dependencies into the lib/ folder to these documents, as long as the instructions are c

Re: Using Iterator-Test-Harness

2016-06-23 Thread Dylan Hutchison
o as a short-term > too. You could change the GAV for the iterator-test-harness module and push > it to Sonatype. Then, maven would be able to find it like normal via maven > central. > > A little bit more work though :) > > > Dylan Hutchison wrote: > >> Thanks Josh an

Re: Using Iterator-Test-Harness

2016-06-21 Thread Dylan Hutchison
h the harness but still letting you have the > ability to tweak it to your needs. This is a fine balance, but one that is > definitely achievable. > > Dylan Hutchison wrote: > >> Hi all, >> >> I like the structure of Accumulo's iterator-test-harness >> <htt

Using Iterator-Test-Harness

2016-06-21 Thread Dylan Hutchison
Hi all, I like the structure of Accumulo's iterator-test-harness module and would like to use it in my own code. I was wondering if there is a better alternative than (1) copying the code, (2) modifying it to suit my own needs

Re: Code Quality Improvements for accumulo

2016-06-21 Thread Dylan Hutchison
What a funny series of PRs to see this morning. As you probably figured out Josh, the DevFactory folks run scripts that download the code of open-source repos, run them with lots of lint, and create PRs with generated fixes. No human created these PRs; no human will respond to your comments. It is

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-19 Thread Dylan Hutchison
Out of curiosity Josh, what tool do you use to verify no public API changes, if not japi? I should also read the pom on !thrift. On Jun 19, 2016 9:26 PM, "Josh Elser" wrote: -1 (binding) HTrace's NOTICE is missing in -bin's NOTICE file """ htrace-core Copyright 2015 The Apache Software Foundati

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-19 Thread Dylan Hutchison
On Sun, Jun 19, 2016 at 1:07 PM, Josh Elser wrote: > > > Dylan Hutchison wrote: > >> +1 with notes below~ >> >> * NOTICE and LICENSE look good to my inexperienced eyes. >> * Source-compiled binary tar.gz matches the binary tar.gz artifact, except >>

Re: [VOTE] Accumulo 1.7.2-rc2

2016-06-18 Thread Dylan Hutchison
+1 with notes below~ * NOTICE and LICENSE look good to my inexperienced eyes. * Source-compiled binary tar.gz matches the binary tar.gz artifact, except for META-INF entries. * Unit tests pass. * Good checksums and sigs. Fingerprint matches Mike's key. * Graphulo tests pass. * Sunny integration te

Re: Apache Accumulo integrated with Presto

2016-06-13 Thread Dylan Hutchison
is typically best to bypass the > Presto layer and insert directly into the Accumulo tables using the > PrestoBatchWriter API > > Cheers, > --Adam > > On Mon, Jun 13, 2016 at 7:20 AM, Christopher wrote: > > > Thanks for that summary, Dylan! Very helpful. > > > &

Re: Apache Accumulo integrated with Presto

2016-06-12 Thread Dylan Hutchison
Thanks for sharing Sean. Here are some notes I wrote after reading the article on Presto-Accumulo design. I have a research interest in the relationship between relational (SQL) and non-relational (Accumulo) systems, so I couldn't resist reading the post in detail. - Places the primary key in

Re: Accumulo on s3

2016-04-25 Thread Dylan Hutchison
Hey Josh, Are there other platforms on AWS (or another cloud provider) that Accumulo/HDFS are friendly to run on? I thought I remembered you and others running the agitation tests on Amazon instances during release-testing time. If there are alternatives, what advantages would S3 have over the c

Re: Checking what a BatchWriter is stuck on; failure during split

2016-04-23 Thread Dylan Hutchison
I'm moving comments to ACCUMULO-4229 <https://issues.apache.org/jira/browse/ACCUMULO-4229>. I created two test cases that establish a baseline and reproduce the bug. I will push a fix PR shortly. ~Dylan On Fri, Apr 22, 2016 at 1:16 PM, Dylan Hutchison wrote: > You caught me S

Re: Checking what a BatchWriter is stuck on; failure during split

2016-04-22 Thread Dylan Hutchison
he client used > by the tserver. > > -- > Shawn Walker > > On Fri, Apr 22, 2016 at 12:19 AM, Dylan Hutchison < > dhutc...@cs.washington.edu> wrote: > > > Hi all, > > > > Check out ACCUMULO-4229 > > <https://issues.apache.org/jira/browse/ACCUMU

Re: Checking what a BatchWriter is stuck on; failure during split

2016-04-21 Thread Dylan Hutchison
Elser wrote: > > > Nice findings. Sorry I haven't had any cycles to dig into this myself. > > > > I look forward to hearing what you find :) > > > > > > Dylan Hutchison wrote: > > > >> I investigated a bit more and I am pretty sure the pr

Re: Checking what a BatchWriter is stuck on; failure during split

2016-04-19 Thread Dylan Hutchison
436. I think there is a timing bug for this edge case, when a table split occurs during heavy writes. I will write this up if I can reproduce it. Maybe it is too rare to matter. Cheers, Dylan On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison wrote: > Hi devs, > > I'd like to ask

Checking what a BatchWriter is stuck on; failure during split

2016-04-18 Thread Dylan Hutchison
Hi devs, I'd like to ask your help in figuring out what is happening to a BatchWriter. The following gives my reasoning so far. In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING status in its addMutation method. I saw that it is stuck by jstack'ing the Accumulo client. It's been

Re: Fwd: Data authorization/visibility limit in Accumulo

2016-04-10 Thread Dylan Hutchison
On Sun, Apr 10, 2016 at 8:32 PM, Josh Elser wrote: > Dylan Hutchison wrote: > >> > 2. What is the most effective way to ingest data, if we're receiving >>> data >>> >>>> >> with the size of>1 TB on a daily basis? >>>> >

Re: Fwd: Data authorization/visibility limit in Accumulo

2016-04-08 Thread Dylan Hutchison
On Fri, Apr 8, 2016 at 3:13 PM, Josh Elser wrote: > Hi Fikri, > > Welcome! You're the first Accumulo enthusiast I've heard from in Indonesia > :) > > Responses inline: > > Fikri Akbar wrote: > >> Hi Guys, >> >> We're a group of accumulo enthusiasts from Indonesia. We've been trying to >> implemen

Re: git-based site and jekyll

2016-03-11 Thread Dylan Hutchison
Sounds great Chris! On Fri, Mar 11, 2016 at 9:50 AM, Christopher wrote: > So, if everybody's happy doing this, I'll go ahead and perform the > following steps: > > 1. Push gh-pages branch to our repo > 2. Perform a jekyll build on the branch and put it in a branch called " > accumulo.apache.org"

Re: [DRAFT][ANNOUNCE] Apache Accumulo 1.7.1

2016-02-26 Thread Dylan Hutchison
Looks good Chris! On Fri, Feb 26, 2016 at 11:53 AM, Christopher wrote: > The Accumulo team is proud to announce the release of Accumulo version > 1.7.1! > > This release contains over 150 bugfixes and improvements over 1.7.0, and is > backwards-compatible with 1.7.0. Existing users of 1.7.0 are

Re: [VOTE] Accumulo 1.7.1-rc2

2016-02-25 Thread Dylan Hutchison
+ 1 * verified checksums and sigs * ran accumulo from the 1.7.1 bin download on my laptop * built from 1.7.1 src download - unit tests and package * ran -Psunny integration tests - a trace test on ShellServerIT was funky, but I think it is due to unfortunate timing, not an actual error On Wed,

Re: [VOTE] Accumulo 1.6.5-rc2

2016-02-15 Thread Dylan Hutchison
+1 Verified checksums and signatures Unit tests run smoothly Graphulo tests run smoothly Integration tests run smoothly on mini-accumulo but are erratic on my single-node laptop for standalone. I think it's an artifact of me not allocating enough memory, not a substantive problem. The tablet se

Re: Automating deploy+testing

2016-01-28 Thread Dylan Hutchison
> to just consider Hadoop/ZooKeeper as pre-requisites. Other products (e.g. > Ambari) likely will do this just fine for what I care. > > I'll take a look at the AccStack project too. Thanks for linking it. > > Dylan Hutchison wrote: > >> It might be useful to look

Re: Automating deploy+testing

2016-01-25 Thread Dylan Hutchison
It might be useful to look at the Accumulo quickinstall maven project. https://github.com/accumulobook/quickinstall Michael Wall wrote the project and I helped test and patch it. It runs from a release but could be modified to build from source instead. There's also a more messy script I used

Re: [DISCUSS] Trivial changes and git

2016-01-06 Thread Dylan Hutchison
I find creating JIRA issues to be a psychological barrier to making commits. Of course tickets are helpful for tracking changes to Accumulo, especially large ones. I would go for #1 with committer's discretion so that a committer could choose when a change is minor enough that it does not need a

Re: another question on summing combiner

2015-10-23 Thread Dylan Hutchison
Hi Z, The batch method you and Josh worked out at first will work. I was illustrating another method which uses conditional writes/deletes as an alternative. It's hard to say which performs better without knowing your workload specifics. Applying the conditional write method to your scenario, t

Re: another question on summing combiner

2015-10-22 Thread Dylan Hutchison
Hi Z, It seems you have a fairly common use case: performing an update if and only if a certain row does or does not exist. Here's another option you could try, and if it works (or doesn't work), please let us know! Of course, if you're comfortable with the batch solution, that is fine too. *Ad

Re: new committers!

2015-10-07 Thread Dylan Hutchison
urrent step I'm thinking about is the mathematical middle layer: how traditional relational algebra relates to Accumulo's operations and to linear algebra algorithms. Cheers, Dylan On Wed, Oct 7, 2015 at 11:39 AM, Billie Rinaldi wrote: > Join me in welcoming Dylan Hutchison and Russ Weeks a

Re: using combiner vs. building stats cache

2015-08-28 Thread Dylan Hutchison
On the BatchWriter, I think it starts flushing data in the background once enough mutations are added to it to consume half the BatchWriter's max memory usage set in BatchWriterConfig. One approach is to set an appropriate max memory, then never worry about manually flushing and let the BatchWrite

Re: [Accumulo Contrib Proposal] Graphulo: Server-side Matrix Math library

2015-08-28 Thread Dylan Hutchison
ecting too much. We can always investigate the option of becoming a sub-project later should the community support arise. Regards, Dylan Hutchison On Fri, Aug 28, 2015 at 12:48 PM, wrote: > [adding dev@ back in, I forgot to reply-all last time] > > I think the process and exit criteri

Re: [Accumulo Contrib Proposal] Graphulo: Server-side Matrix Math library

2015-08-28 Thread Dylan Hutchison
his is a code contribution,it will have to go > through that process. > > > > ---- Original message > From: Dylan Hutchison > Date: 08/28/2015 2:43 AM (GMT-05:00) > To: Accumulo Dev List > Cc: Accumulo User List > Subject: [Accumulo Contrib Proposal]

Re: using combiner vs. building stats cache

2015-08-28 Thread Dylan Hutchison
Sounds like you have the idea now Z. There are three places an iterator can be applied: scan time, minor compaction time, and major compaction time. Minor compactions help your case a lot-- when enough entries are written to a tablet server that the tablet server needs to dump them to a new Hadoo

[Accumulo Contrib Proposal] Graphulo: Server-side Matrix Math library

2015-08-27 Thread Dylan Hutchison
tackle these at our leisure. Let's discuss Graphulo and Accumulo here first. Warmly, Dylan Hutchison

Re: using combiner vs. building stats cache

2015-08-26 Thread Dylan Hutchison
Go for option #2 and use the combiners. It's one of the core features of Accumulo and the overhead at insert-time is minimal. Developer time overhead is also minimal-- add a couple lines next to where you make your mutations and you're done. Regards, Dylan On Wed, Aug 26, 2015 at 6:11 PM, z1137

Re: One instance, multiple Accumulo versions?

2015-03-09 Thread Dylan Hutchison
. The two instances look stable, thought I haven't tried running them both at once. Side note: if we notice a bug in the Accumulo trunk (not an Accumulo release), is the JIRA still the right place to go to document / patch it? Regards, Dylan Hutchison On Sun, Mar 8, 2015 at 10:02 PM, Mike Drob

One instance, multiple Accumulo versions?

2015-03-08 Thread Dylan Hutchison
that helps. Regards, Dylan Hutchison

Re: Design-for-comment: Accumulo Server-Side Computation: Stored Procedure Tables starring SpGEMM

2015-02-26 Thread Dylan Hutchison
https://blogs.apache.org/hbase/entry/coprocessor_introduction> and it seems our "stored procedure" idea bears many similarities to HBase Coprocessors of the "Endpoint" type (the "Observer" type has parallels with the Accumulo Fluo subproject). Maybe we can tak

Re: Design-for-comment: Accumulo Server-Side Computation: Stored Procedure Tables starring SpGEMM

2015-02-26 Thread Dylan Hutchison
n> and it seems our "stored procedure" idea bears many similarities to HBase Coprocessors of the "Endpoint" type (the "Observer" type has parallels with the Accumulo Fluo subproject). Maybe we can take some architecture lessons from HBase. Anyone have experience th

Re: Design-for-comment: Accumulo Server-Side Computation: Stored Procedure Tables starring SpGEMM

2015-02-26 Thread Dylan Hutchison
f one result of this project is writing a guideline / best-practices document on *where to place iterators, *or in other words, where to place computation. Regards, Dylan Hutchison On Thu, Feb 26, 2015 at 2:34 PM, Christopher wrote: > Hi Dylan, > > It's a clever way to leverage e

Re: Design-for-comment: Accumulo Server-Side Computation: Stored Procedure Tables starring SpGEMM

2015-02-26 Thread Dylan Hutchison
changes to Accumulo core may make the implementation easier or increase performance. Looking forward to feedback. The stored procedure generalization / analogy came up as we thought through designs. It seems like many users could benefit from the concept. Regards, Dylan Hutchison On Thu, Feb 26, 20

Design-for-comment: Accumulo Server-Side Computation: Stored Procedure Tables starring SpGEMM

2015-02-26 Thread Dylan Hutchison
/Accla/accumulo_stored_procedure_design Would love to hear your opinion, especially if the proposed design pattern matches one of *your use cases*. Regards, Dylan Hutchison

Re: Looking for open source graph traversal libraries that run on Accumulo

2014-11-22 Thread Dylan Hutchison
You can try this if you're a fan of the TinkerPop family: https://github.com/JHUAPL/AccumuloGraph/ On Wed, Nov 19, 2014 at 5:28 PM, fredch wrote: > Trying to use Accumulo for graph processing and querying and trying to > determine what traversal tools are already out there. Any help greatly > ap