Re: Why may "tablet read ahead" take long time? (was: Profile a (batch) scan)

2019-01-15 Thread Adam Fuchs
Hi Maxim, What you're seeing is an artifact of the threading model that Accumulo uses. When you launch a query, Accumulo tablet servers will coordinate RPCs via Thrift in one thread pool (which grows unbounded) and queue up scans (rfile lookups, decryption/decompression, iterators, etc.) in anothe

Re: Major Compactions

2017-12-12 Thread Adam Fuchs
Watch out for ACCUMULO-4578 if you're using --cancel on one of the affected versions (1.7.2 or 1.8.0 or earlier). Adam On Tue, Dec 12, 2017 at 7:57 AM, Mike Walch wrote: > There should be a mention of the --cancel option in the docs. I created a > PR to add it to the 2.0 docs: > > https://git

Re: Key Refactroing

2017-06-21 Thread Adam Fuchs
Sven, You might consider using a combination of AccumuloInputFormat and AccumuloFileOutputFormat in a map/reduce job. The job will run in parallel, speeding up your transformation, the map/reduce framework should help with hiccups, and the bulk load at the end provides a atomic, eventually consist

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
your cache hit rate was? Adam On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser wrote: > 5 iterations, figured that would be apparent from the log messages :) > > The code is already posted in my original message. > > Adam Fuchs wrote: > >> Josh, >> >> Two questions

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Josh, Two questions: 1. How many iterations did you do? I would like to see an absolute number of lookups per second to compare against other observations. 2. Can you post your code somewhere so I can run it? Thanks, Adam On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser wrote: > Sven, et al: > >

Re: Adding a second node to a single node installation

2016-05-23 Thread Adam Fuchs
Cyrille, I think you're going to have to do a few things to get the nodes to act as a cluster: 1. How would you like your Zookeeper cluster to be set up? If you're planning on using a one-node Zookeeper instance on the master node, then you may need to turn zookeeper off on your second node and s

Re: Accumulo folks at Hadoop Summit San Jose

2016-05-19 Thread Adam Fuchs
I'll be there. Adam On Thu, May 19, 2016 at 11:01 AM, Josh Elser wrote: > Out of curiosity, are there going to be any Accumulo-folks at Hadoop > Summit in San Jose, CA at the end of June? > > - Josh >

Re: Three day Fluo Common Crawl test

2016-01-12 Thread Adam Fuchs
Nice writeup! Thanks, Adam On Tue, Jan 12, 2016 at 11:59 AM, Keith Turner wrote: > We just completed a three day test of Fluo using Common Crawl data that > went pretty well. > > http://fluo.io/webindex-long-run/ > > >

Re: Trigger for Accumulo table

2015-12-08 Thread Adam Fuchs
I totally agree, Christopher. I have also run into a few situations where it would have been nice to have something like a mutation listener hook. Particularly in generating indexing and stats records. Adam On Tue, Dec 8, 2015 at 5:59 PM, Christopher wrote: > In the future, it might be useful

Re: Can't connect to Accumulo

2015-12-04 Thread Adam Fuchs
Mike, I suspect if you get rid of the "localhost" line and restart Accumulo then you will get services listening on the non-loopback IPs. Right now you have some of your processes accessible outside your VM and others only accessible from inside, and you probably have two tablet servers when you s

Re: Quick question re UnknownHostException

2015-11-13 Thread Adam Fuchs
Josef, If these are intermittent failures, you might consider turning on the watcher [1] to automatically restart your processes. This should keep your cluster from atrophying over time. You'll still have to take administrative action to fix the DNS problem, but your availability should be better.

Re: pre-sorting row keys vs not pre-sorting row keys

2015-10-29 Thread Adam Fuchs
I bet what you're seeing is more efficient batching in the latter case. BatchWriter goes through a binning phase whenever it fills up half of its buffer, binning everything in the buffer into tablets. If you give it sorted data it will probably be binning into a subset of the tablets instead of all

Re: Is there a sensible way to do this? Sequential Batch Scanner

2015-10-28 Thread Adam Fuchs
Rob, I would use something like an IteratorChain [1] and fead it Scanner.iterator() objects. If you setReadaheadThreshold(0) on the scanner then calling Scanner.iterator() is a fairly lightweight operation, and you'll be able to plop a bunch of iterators into the IteratorChain so that they are dyn

Re: Why the Range not find the data

2015-10-14 Thread Adam Fuchs
Try using the Range.exact(...) and Range.prefix(...) helper methods to generate specific ranges. Key.followingKey(...) might also be helpful. Cheers, Adam On Wed, Oct 14, 2015 at 9:59 AM, Lu Qin wrote: > In my accumulo cluster ,the table has this data: > 0 cf0:cq0 []v0 > 1 cf1:cq1 []v1

Re: What is the optimal number of tablets for a large table?

2015-10-13 Thread Adam Fuchs
Here are a few other factors to consider: 1. Tablets may not be used uniformly. If there is a temporal element to the row key then writes and reads may be skewed to go to a portion of the tablets. If some tables are big but more archival in nature then they will skew the stats as well. It's usually

Re: Watching for Changes with Write Ahead Log?

2015-10-01 Thread Adam Fuchs
by another system occur. >> >> >> >> I would need to do a few things to accomplish my goal. >> >> >> >> 1) Be notified or see that a table had changed >> >> 2) Checked that against changes I know my system has made >> >> 3)

Re: Document Partitioned Indexing

2015-09-30 Thread Adam Fuchs
Hi Tom, Sqrrl uses a document-distributed indexing strategy extensively. On top of the reasons you mentioned, we also like the ability to explicitly structure our index entries in both information content and sort order. This gives us the ability to do interesting things like build custom indexes

Re: Watching for Changes with Write Ahead Log?

2015-09-29 Thread Adam Fuchs
Jon, You might think about putting a constraint on your table. I think the API for constraints is flexible enough for your purpose, but I'm not exactly sure how you would want to manage the results / side effects of your observations. Adam On Tue, Sep 29, 2015 at 5:41 PM, Parise, Jonathan wrot

Re: Presplitting tables for the YCSB workloads

2015-09-18 Thread Adam Fuchs
You could cat the splits to a temp file, then use the -sf option of createtable, piping the command to the accumulo shell's standard in: $ echo "createtable ycsb_tablename -sf /tmp/ycsb_splits.txt" | accumulo shell -u user -p password -z instancename zoohost:2181 Not sure if the row keys are iden

Re: RowID design and Hive push down

2015-09-14 Thread Adam Fuchs
Hi Roman, What's the used for in your previous key design? As I'm sure you've figured out, it's generally a bad idea to have a fully unique hash in your key, especially if you're trying to support extensive secondary indexing. What we've found is that it's not just the size of the key but also t

Re: Accumulo: "BigTable" vs. "Document Model"

2015-09-04 Thread Adam Fuchs
Sqrrl uses a hybrid approach. For records that are relatively static we use a compacted form, but for maintaining aggregates and for making updates to the compacted form documents we use a more explicit form. This is done mostly through iterators and a fairly complex type system. The big trade-off

rya incubator proposal

2015-09-03 Thread Adam Fuchs
Hey Accumulopers, I thought you might like to know that the Rya project just proposed to join the incubator. Rya is a mature project that supports RDF on top of Accumulo. Feel free to join the discussion or show support on the incubator general list. Cheers, Adam

Re: Scanning In Timestamp Order

2015-09-02 Thread Adam Fuchs
Jon, There is some magic, but unfortunately it's not yet implemented: ACCUMULO-652 Want to take over that project? Adam On Wed, Sep 2, 2015 at 5:14 PM, Parise, Jonathan wrote: > I was pretty sure this was the answer. > > Yes it makes sense to me. I was expecting this response. I was hoping fo

Re: Questions on intersecting iterator and partition ids

2015-07-13 Thread Adam Fuchs
Vaibhav, I have included some answers below. Cheers, Adam On Mon, Jul 13, 2015 at 11:19 AM, vaibhav thapliyal < vaibhav.thapliyal...@gmail.com> wrote: > Dear all, > > I have the following questions on intersecting iterator and partition ids > used in document sharded indexing: > > 1. Can we run

Re: micro compaction

2015-06-09 Thread Adam Fuchs
I think this might be the same concept as in-mapper combining, but applied to data being sent to a BatchWriter rather than an OutputCollector. See [1], section 3.1.1. A similar performance analysis and probably a lot of the same code should apply here. Cheers, Adam [1] http://lintool.github.io/Ma

Re: Change column family

2015-05-26 Thread Adam Fuchs
This can also be done with a row-doesn't-fit-into-memory constraint. You won't need to hold the second column in-memory if your iterator tree deep copies, filters, transforms and merges. Exhibit A: [HeapIterator-derivative] |_ | \ [transform-gr

Re: Accumulo Summit 2015

2015-05-04 Thread Adam Fuchs
mmit 2015. > > -Met with some great folks (special shout out to Josh Elsner and > Adam Fuchs for their time and patience answering questions). > > -Can’t wait for next year’s summit. > > > > Any idea when the slides for the presentations will be available?

Re: Unexpected aliasing from RFile getTopValue()

2015-04-15 Thread Adam Fuchs
On Wed, Apr 15, 2015 at 10:20 AM, Keith Turner wrote: > > > Random thought on revamp. Immutable key values with enough primitives to > make most operations efficient (avoid constant alloc/copy) might be > something to consider for the iterator API > > So, is this a tradeoff in the performance vs.

Re: Local Combiners to pre-sum at BatchWriter

2015-04-06 Thread Adam Fuchs
Dylan, Here's an interesting history note: Accumulo used to run some types of iterators (essentially Combiners before Combiners existed) at the time of writing data to the in-memory map. This was removed because some combiners, like string appends, can cause O(n^2) performance when run in that sco

Re: init method being called multiple times of WrappingIterator.

2015-04-03 Thread Adam Fuchs
A major compaction also might not be a full major compaction, depending on how it is initiated. It also would be on a single tablet where a scan might be over multiple tablets. The implication here is that major compactions might not process all of the data that the scan processes. The iterator li

Re: Scans during Compaction

2015-02-23 Thread Adam Fuchs
wait for the entire compaction to complete. > > Regards, > Dylan Hutchison > > On Mon, Feb 23, 2015 at 12:48 PM, Adam Fuchs wrote: > >> Dylan, >> >> The effect of a major compaction is never seen in queries before the >> major compaction completes. At the en

Re: Scans during Compaction

2015-02-23 Thread Adam Fuchs
Dylan, The effect of a major compaction is never seen in queries before the major compaction completes. At the end of the major compaction there is a multi-phase commit which eventually replaces all of the old files with the new file. At that point the major compaction will have completely process

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Adam Fuchs
see the results of "InjectIterator", then we > need to place InjectIterator2 below InjectIterator on the hierarchy, > whether in Fig. A or Fig. B. > > For my particular situation, reading from another Accumulo table inside an > iterator, I'm not sure which is better. I like t

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Adam Fuchs
Dylan, If I recall correctly (which I give about 30% odds), the original purpose of the side channel was to split up things like delete tombstone entries from "regular" entries so that other iterators sitting on top of a bifurcating iterator wouldn't have to handle the special tombstone preservati

Re: Keys with identical timestamps

2015-02-09 Thread Adam Fuchs
Hi Dave, As long as your combiner is associative and commutative both of the values should be represented in the combined result. The non-determinism is really around ordering, which generally doesn't matter for a combiner. Adam On Mon, Feb 9, 2015 at 3:49 PM, Dave Hardcastle wrote: > Hi, > > C

Re: hdfs cpu usage

2015-02-09 Thread Adam Fuchs
Ara, What kind of query load are you generating within your batch scanners? Are you using an iterator that seeks around a lot? Are you grabbing many small batches (only a few keys per range) from the batch scanner? As a wild guess, this could be the result of lots of seeks with a low cache hit rat

Re: Seeking Iterator

2015-01-12 Thread Adam Fuchs
On Mon, Jan 12, 2015 at 4:10 PM, Josh Elser wrote: > seek()'ing doesn't always imply an increase in performance -- remember that > RFiles (the files that back Accumulo tables), are composed of multiple > blocks/sections with an index of them. A seek is comprised of using that > index to find the b

Re: Accumulo available in Fedora 21

2014-12-15 Thread Adam Fuchs
Neato! Adam On Mon, Dec 15, 2014 at 3:25 PM, Christopher wrote: > Accumulators, > > Fedora Linux now ships with Accumulo 1.6 packaged and available in its yum > repositories, as of Fedora 21. Simply run "yum install accumulo" to get > started. You can also just install sub-packages, as in "yum

Re: comparing different rfile "densities"

2014-11-13 Thread Adam Fuchs
hat "fit in to one tserver", isn't there > still an issue that >the new rfile may cover 100's of the tablets owned by a tserver? so any > scan of >any of those tablets will have to peek in the new file (until > compaction). > > i think i'm getting

Re: comparing different rfile "densities"

2014-11-11 Thread Adam Fuchs
No trickery there -- all tablets for which there are keys in the file will reference the file directly after bulk load. Adam On Tue, Nov 11, 2014 at 2:57 PM, Jeff Turner wrote: > is a file listed in metadata under *all* of the tablets that might have > entries in the file? > > (this example is p

Re: comparing different rfile "densities"

2014-11-11 Thread Adam Fuchs
Jeff, "Density" is an interesting measure here, because RFiles are going to be sorted such that, even when the file is split between tablets, a read of the file is going to be (mostly) a sequential scan. I think instead you might want to look at a few other metrics: network overhead, name node ope

Re: Remotely Accumulo

2014-10-06 Thread Adam Fuchs
Accumulo tservers typically listen on a single interface. If you have a server with multiple interfaces (e.g. loopback and eth0), you might have a problem in which the tablet servers are not listening on externally reachable interfaces. Tablet servers will list the interfaces that they are listenin

Re: Determining tablets assigned to table splits, and the number of rows in each tablet

2014-10-06 Thread Adam Fuchs
A few years ago we hashed out a rough idea of creating a stats API that would allow users to ask a variety of questions that leverage information that is already present in the system. Those questions would include things like: * Estimate of number of keys in a range. This would satisfy the "key c

Re: Compaction slowing queries

2014-09-11 Thread Adam Fuchs
You can change compression codecs at any time on a per-table basis. This only affects how new files are written. Existing files will still be read the same way. See the table.file.compress.type parameter. One caveat is that you need to make sure your codec is supported before switching to it or co

Re: Compaction slowing queries

2014-09-11 Thread Adam Fuchs
Paul, Here are a few suggestions: 1. Reduce the number of concurrent compaction threads (tserver.compaction.major.concurrent.max, and tserver.compaction.minor.concurrent.max). You probably want to lean towards twice as many major compaction threads as minor, but that somewhat depends on how burst

Re: Advice on increasing ingest rate

2014-04-09 Thread Adam Fuchs
, 2014 4:42 PM, "Mike Hugo" wrote: > > > > On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs wrote: > >> MIke, >> >> What version of Accumulo are you using, how many tablets do you have, and >> how many threads are you using for minor and major compaction poo

Re: Advice on increasing ingest rate

2014-04-08 Thread Adam Fuchs
MIke, What version of Accumulo are you using, how many tablets do you have, and how many threads are you using for minor and major compaction pools? Also, how big are the keys and values that you are using? Here are a few settings that may help you: 1. WAL replication factor (tserver.wal.replicat

Re: Accumulo Monitor Page Time Window

2014-03-22 Thread Adam Fuchs
The time window is actually a fixed set of points, and when the master is down no points are collected. The monitor often continues to run in the background when the master is down, so it will remember the points from the last session. Eventually the time will catch up and the monitor will display

Re: HDFS caching w/ Accumulo?

2014-02-26 Thread Adam Fuchs
Maybe this could be used to speed up WAL recovery for use cases that demand really high availability and low latency? Adam On Feb 25, 2014 10:50 AM, "Donald Miner" wrote: > HDFS caching is part of the new Hadoop 2.3 release. From what I > understand, it allows you to mark specific files to be he

Re: WAL - rate limiting factor x4.67

2013-12-04 Thread Adam Fuchs
One thing you can do is reduce the replication factor for the WAL. We have found that makes a pretty significant different in write performance. That can be modified with the tserver.wal.replication property. Setting it to 2 instead of the default (probably 3) should give you some performance impro

Re: Efficient Tablet Merging [SEC=UNOFFICIAL]

2013-10-03 Thread Adam Fuchs
Never underestimate the power of ascii art! Adam On Oct 2, 2013 11:28 PM, "Eric Newton" wrote: > I'll use ASCII graphics to demonstrate the size of a tablet. > > Small: [] > Medium: [ ] > Large: [ ] > > Think of it like this... if you are running age-off... you probably have > lots of little bu

Re: Trouble with IntersectingIterator

2013-10-01 Thread Adam Fuchs
Heath, In your case, the question that you are effectively asking is "within each partition, which documents' index entries include all of the given terms". Since you have partitions aligned by field and only a single index entry per field you will not get any matches for queries with more than on

Re: My Accumulo 1.5.0 instance has no tablet servers

2013-10-01 Thread Adam Fuchs
To follow up on this, I think maybe the config should be dfs.datanode.synconclose, not dfs.data.synconclose. Was that a typo, Eric? Thanks, Adam On Thu, Sep 12, 2013 at 2:31 PM, Eric Newton wrote: > Add: > > > dfs.support.append > true > > > dfs.data.synconclose >

RE: Assigned and hosted Error [SEC=UNOFFICIAL]

2013-09-30 Thread Adam Fuchs
Matt, Did you include any patches that have not been committed to the 1.5 branch in your snapshot? Adam On Sep 30, 2013 6:25 PM, "Dickson, Matt MR" wrote: > ** > > *UNOFFICIAL* > 1.5.1-SNAPSHOT from 20/09/13. > > -- > *From:* Sean Busbey [mailto:bus...@cloudera.com]

Re: BatchWriter performance on 1.4

2013-09-19 Thread Adam Fuchs
The addMutations method blocks when the client-side buffer fills up, so you may see a lot of time spent in that method due to a bottleneck downstream. There are a number of things you could try to speed that up. Here are a few: 1. Increase the BatchWriter's buffer size. This can smooth out the netw

Re: Getting the IP Address

2013-08-28 Thread Adam Fuchs
Seems like a question a common and complex as which IP address to listen on would have a fair amount of precedent in open-source projects that we could pull from. Are we reinventing the wheel? Does anyone have an example of an application like ours with the same set of supported platforms that has

Re: Will Accumulo work without password-less SSH?

2013-07-31 Thread Adam Fuchs
Josh, You might take a peek at the init scripts that are in the scripts directory of Accumulo 1.5. They are an alternative mechanism to the management scripts that are in the bin directory, and they don't rely on password-less ssh. Adam On Wed, Jul 31, 2013 at 3:59 PM, Smith, Joshua D. wrote:

Re: accumulo for a bi-map?

2013-07-17 Thread Adam Fuchs
Marc, You might also want to check out D4M and the table organization that it uses in Accumulo. D4M stores matrixes and their transforms, which is essentially the same concept as a bidirectional map or a bidirected graph: http://www.mit.edu/~kepner/D4M/ Cheers, Adam On Tue, Jul 16, 2013 at 5:2

RE: Preferred method for a client to obtain a connector reference

2013-05-30 Thread Adam Fuchs
gt; > ** ** > > ** ** > > *From:* Adam Fuchs [mailto:afu...@apache.org] > *Sent:* Thursday, May 30, 2013 12:57 PM > *To:* user@accumulo.apache.org > *Subject:* Re: Preferred method for a client to obtain a connector > reference > > ** ** > > Elise, > &g

Re: Preferred method for a client to obtain a connector reference

2013-05-30 Thread Adam Fuchs
Elise, You'll want to use instance.getConnector(...), where instance is probably a ZookeeperInstance. Cheers, Adam On May 30, 2013 3:20 PM, "Newman, Elise" wrote: > Hello! > > ** ** > > Stupid question: What is the preferred way for a client to get a connector > reference? The SimpleClient

Re: master fails to start

2013-05-21 Thread Adam Fuchs
Chris, Did you copy the conf/accumulo.policy.example to conf/accumulo.policy? If so, you may need to make some changes to account for changes to hadoop security. I suspect the problem is that the codebase "file:${hadoop.home.dir}/lib/*" reference doesn't include your CDH3 libraries. You could modi

Re: [VOTE] 1.5.0-RC3

2013-05-17 Thread Adam Fuchs
Thanks for putting up with us picky people, Chris! Adam On May 17, 2013 6:15 PM, "Christopher" wrote: > So, > > I've fixed the problem with the src tarball including binaries, and I > believe I've satisfied all the concerns regarding the naming > conventions. > I'm going to go ahead and include

Re: [VOTE] 1.5.0-RC3

2013-05-17 Thread Adam Fuchs
illie Rinaldi" wrote: > On Fri, May 17, 2013 at 7:35 AM, Adam Fuchs wrote: > >> Looks like the src part of the distribution is >> accumulo-project-1.5.0-src.tar.gz. For the same reasons that we removed >> the "assemble" tag form the bin package, shouldn't we

Re: [VOTE] 1.5.0-RC3

2013-05-17 Thread Adam Fuchs
Looks like the src part of the distribution is accumulo-project-1.5.0-src.tar.gz. For the same reasons that we removed the "assemble" tag form the bin package, shouldn't we remove the "project" tag from the src package? This also has implications as to whether we can just untar both the bin and src

Re: Iterators returning keys out of scan range

2013-05-01 Thread Adam Fuchs
Mr. VonCloud, I suspect you're going for something like eureka synchronization. I suppose that might work, but I wouldn't rely on that behavior persisting long-term. It's definitely in the "undefined" set right now. I can't think of another way you would do what I presume you want to do without mo

Re: Accumulo software and processes owner

2013-04-26 Thread Adam Fuchs
Terry, To properly secure you Accumulo install it's important that the shared secret in the Accumulo configs only be shared with the Accumulo processes, so I would recommend using a separate accumulo user. In HDFS you can create the directory that Accumulo writes to (/accumulo by default) and the

Re: Determining the cause of a tablet server failure

2013-02-27 Thread Adam Fuchs
y again > and made sure everything was set up correctly and restarted everything. > > We never did see anything in out log files or .out / .err logs indicating > the source of the problem, but the above is my best guess as to what was > going on. > > Thanks again for all the tips a

Re: Determining the cause of a tablet server failure

2013-02-27 Thread Adam Fuchs
There are a few primary reasons why your tablet server would die: 1. Lost lock in Zookeeper. If the tablet server and zookeeper can't communicate with each other then the lock will timeout and the tablet server will kill itself. This should show up as several messages in the tserver log. If this ha

Re: Suggestions on modeling a composite row key

2013-02-27 Thread Adam Fuchs
At sqrrl, we tend to use a Tuple class that implements List (List would also work), and has conversions to and from ByteBuffer. To encode the tuple into a byte buffer, change all the "\1"s to "\1\2", change all the "\0"s to "\1\1", and put a "\0" byte between elements. "\1" is used as an escape cha

Re: MinCombiner and MaxCombiner priority issue [SEC=UNCLASSIFIED]

2013-02-12 Thread Adam Fuchs
Hi Matt, I tried to replicate the behavior you saw and was not able to do so. There must be some other factors involved. Can you describe what version of Accumulo you have running and anything else that might be unique about the instance (other iterators configured on the table, any additional cod

Re: NoSuchMethodError: FieldValueMetaData (Conflict between hue-plugins-1.2.0-cdh3u5.har and libthrift-0.6.1.jar)

2013-02-08 Thread Adam Fuchs
Is that related to https://issues.apache.org/jira/browse/ACCUMULO-837? Do you have a stack trace you can share? Adam On Fri, Feb 8, 2013 at 10:34 AM, David Medinets wrote: > I am running a map-reduce job. As soon as my mapper tried to serialize > a Mutation I run into a NoSuchMethodError in re

Re: infinite number of max.versions?

2013-01-28 Thread Adam Fuchs
Mike, The way to do that is to remove the versioning iterator entirely. Just delete the configuration parameters for that iterator: something like "config -t tablename -d table.iterator.scan.vers" in the accumulo shell, for each of the six configuration parameters. Adam On Mon, Jan 28, 2013 at

Re: Can I bulk import into an existing table?

2013-01-25 Thread Adam Fuchs
You can bulk load into existing tables, and Accumulo will figure out which tablets to assign your files to. In your example, your file with row id values of 3 would go into the tablet that ends at row 4. You could also dynamically add a split point of 3, and bulk load would then put your file into

Re: Custom Iterators - behavior when switching tablets

2013-01-23 Thread Adam Fuchs
David, The core challenge here is to be able to continue scans under failure conditions. There are several places where we tear down the iterator tree and rebuild it, including when tablet servers die, when we need to free resources to support concurrency, and a few others. In order to continue a

Re: scripted way to create users

2013-01-18 Thread Adam Fuchs
Using the Java API through JRuby or Jython would be another option. With Jython, that would look something like this: > export JYTHONPATH=$ACCUMULO_HOM/lib/accumulo-core-1.4.2.jar:$ACCUMULO_HOME/lib/log4j-1.2.16.jar:$ZOOKEEPER_HOME/zookeeper-*.jar:$HADOOP_HOME/hadoop-core-1.0.3.jar:$ACCUMULO_HOME/

Re: Accumulo Junit Concurrency/Latency issues ( Accumulo 1.3 )

2012-11-29 Thread Adam Fuchs
nit-Accumulo specific & am wondering if anyone else has > experienced the same issues? > > -Josh > > > > > > > On Thu, Nov 29, 2012 at 10:51 AM, Eric Newton wrote: > >> "I am definitely using the same key to update and retrieve the data." >> >

Re: Accumulo Junit Concurrency/Latency issues ( Accumulo 1.3 )

2012-11-29 Thread Adam Fuchs
Josh, Can you share your junit test code so I can replicate this behavior? Adam On Thu, Nov 29, 2012 at 9:59 AM, Joe Berk wrote: > Good morning all, > > I'm experiencing some "weirdness" when executing JUnit tests for my > classes that operate with Accumulo. I can best describe it as latency

Re: [VOTE] accumulo-1.4.2 RC4

2012-11-09 Thread Adam Fuchs
+1 The only problem I have found is that the example policy file is still not included (ACCUMULO-364), but that has been corrected for the next version for real this time. The release notes are slightly wrong in that respect, but I don't think this should delay release. Checked signatures, hashes

Re: Accumulo design questions

2012-11-06 Thread Adam Fuchs
> 4. In supporting dynamic column families, was there a design trade-off > with > respect to the original BigTable or current HBase design? What might > be a > benefit of doing it the other way? > > One trade-off is that pinning locality groups in memory (i.e. making them ephemeral) wo

Re: table not getting created in continuous-ingest test.

2012-10-30 Thread Adam Fuchs
Sorry, I should read more closely. It looks like the ci process doesn't think that Accumulo is running. Did you enter the right INSTANCE_NAME and ZOO_KEEPERS settings in continuous-env.sh? (Eric beat me to this, but I'm going to send anyway!) Cheers, Adam On Tue, Oct 30, 2012 at 4:5

Re: table not getting created in continuous-ingest test.

2012-10-30 Thread Adam Fuchs
Try creating the "ci" table using the accumulo shell before you run the test. The continuous ingest test is designed to run in parallel with many workers, so it doesn't try to create the table itself. Cheers, Adam On Tue, Oct 30, 2012 at 4:53 PM, Ranjan Sen wrote: > we are trying to run the c

Re: Number of partitions for sharded table

2012-10-30 Thread Adam Fuchs
Krishmin, There are a few extremes to keep in mind when choosing a manual partitioning strategy: 1. Parallelism and balance at ingest time. You need to find a happy medium between too few partitions (not enough parallelism) and too many partitions (tablet server resource contention and inefficient

Re: [VOTE] accumulo-1.4.2 RC3

2012-10-26 Thread Adam Fuchs
Oops, looks like Eric and I owe donuts. Anyone know how to get vim to automatically add license headers? ;-) Adam On Fri, Oct 26, 2012 at 11:14 AM, Billie Rinaldi wrote: > -1 > > These files don't have licenses: > > src/core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIte

Re: What is the Communication and Time Complexity for Bulk Inserts?

2012-10-24 Thread Adam Fuchs
For the bulk load of one file, shouldn't it be roughly O(log(n) * log(P) * p), where n is the size of the file, P is the total number of tablets (proportional to tablet servers), and p is the number of tablets that get assigned that file? For the BatchWriter case, there's a client-side lookup/binn

Re: Tablet Load Balancer

2012-10-04 Thread Adam Fuchs
ot just add. And then rebalance the cluster? > On Oct 4, 2012 6:15 PM, "Adam Fuchs" wrote: > >> Roshan, >> >> There's no way to make sure that multiple rows don't get split into >> multiple tablets. A custom load balancer would be able to make sure

Re: Tablet Load Balancer

2012-10-04 Thread Adam Fuchs
Roshan, There's no way to make sure that multiple rows don't get split into multiple tablets. A custom load balancer would be able to make sure that a set of tablets are hosted together, but has no effect on choosing the split points. If you want to guarantee that multiple entries are kept togethe

Re: Accumulo Between Two Centers (DR - disaster recovery)

2012-09-26 Thread Adam Fuchs
Another way to say this is that cross-data center replication for Accumulo is left to a layer on top of Accumulo (or the application space). Cassandra supports a mode in which you can have a bigger write replication than write quorum, allowing writes to eventually propagate and reads to happen on s

Re: Mock classes for JUnit Testing

2012-09-25 Thread Adam Fuchs
Too slow, Keith! :) Adam On Sep 25, 2012 9:55 AM, "Keith Turner" wrote: > What I think is going is that the class MockTableOperations references > org.apache.commons.lang.NotImplementedException. When you try to load > the class MockTableOperations, it tries to load the dependency > NotImpleme

Re: Mock classes for JUnit Testing

2012-09-25 Thread Adam Fuchs
Josh, This is a classpath problem. The JVM is failing to load the MockTableOperations class because it has an include line that references that NotImplementedException class. Try adding all of the jars from Accumulo's lib directory to your classpath. Adam On Sep 25, 2012 9:30 AM, "Joe Berk" wrot

Re: Mock classes for JUnit Testing

2012-09-25 Thread Adam Fuchs
Josh, Can you post the stack trace that came with the NPE? Adam On Sep 25, 2012 8:47 AM, "Joe Berk" wrote: > Hello, > > > > I am trying to write JUnit tests for Accumulo and I keep running into > dead-ends with the “Mock” classes. > > > > /* > > * So, the following lines are how I would tradi

Re: bulk ingested table showing zero entries on the monitor page

2012-09-21 Thread Adam Fuchs
John is referring to the streaming ingest, not the bulk ingest. Dave is correct on this one. Basically, we don't count the records when you bulk ingest so that we can get sub-linear runtime on the bulk ingest operation. Adam On Fri, Sep 21, 2012 at 4:22 PM, ameet kini wrote: > > I was expectin

RE: EXTERNAL: Re: Failing Tablet Servers

2012-09-20 Thread Adam Fuchs
My guess would be that you are building an object several gigabytes in size and Accumulo is copying it. Do you need all of those entries to be applied atomically (in which case you should look into bulk loading), or can you break them up into multiple mutations? I would say you should keep your mut

Re: tablet server initialization issue

2012-09-19 Thread Adam Fuchs
I assume you mean it's listening on 127.0.0.1 on that node? Are you using bin/start-all.sh to launch your cluster? I would guess this is a DNS setup issue on that one machine. Maybe something like the domainname not being set? When you run "hostname" on that machine, do you get a similar format to

Re: Running Accumulo straight from Memory

2012-09-12 Thread Adam Fuchs
files. Adam On Sep 12, 2012 5:20 PM, "David Medinets" wrote: > Why would locality groups be useful in an in-memory system? > > On Wed, Sep 12, 2012 at 4:53 PM, Adam Fuchs wrote: > > Even if you are just using memory, minor and major compactions are > important > >

RE: Running Accumulo straight from Memory

2012-09-12 Thread Adam Fuchs
speeds that we > had wanted. > > ** ** > > Matt > > ** ** > > ** ** > > *From:* user-return-1330-MATTHEW.J.MOORE=saic@accumulo.apache.org[mailto: > user-return-1330-MATTHEW.J.MOORE=saic@accumulo.apache.org] *On Behalf > Of *Adam Fuchs > *Sent:

Re: Running Accumulo straight from Memory

2012-09-11 Thread Adam Fuchs
Matthew, I don't know of anyone who has done this, but I believe you could: 1. mount a RAM disk 2. point the hdfs core-site.xml fs.default.name property to file:/// 3. point the accumulo-site.xml instance.dfs.dir property to a directory on the RAM disk 4. disable the WAL for all tables by setting

RE: ColumnQualifierFilter

2012-09-10 Thread Adam Fuchs
fetchColumn is agglomerative, so if you call it multiple times it will fetch multiple columns. Adam On Sep 10, 2012 6:25 PM, wrote: > Billie > > ** ** > > That’s what I’m doing at the moment, but I’d like to give the iterator a > collection of CF/CQ to filter on. Is that possible? > > *

Re: [receivers.SendSpansViaThrift] ERROR: java.net.ConnectException: Connection refused

2012-09-05 Thread Adam Fuchs
My guess is that your tracer was listening on 127.0.0.1 and registered that ip address in zookeeper. Other nodes would have had trouble contacting that ip address. Switching to the hostname of the master node should fix it too if that was the problem. Maybe we should put in a ticket to have servic

Re: [receivers.SendSpansViaThrift] ERROR: java.net.ConnectException: Connection refused

2012-09-05 Thread Adam Fuchs
Fred, One tracer is fine, and you can set that to be the same as the master node. You also need to set the username and password for the tracer in accumulo-site.xml if you haven't already. Adam On Sep 5, 2012 1:22 PM, "Fred Wolfinger" wrote: > Hey Marc, > > I can't tell you how much I appreciat

Re: Using Accumulo as input to a MapReduce job frequently hangs due to lost Zookeeper connection

2012-08-16 Thread Adam Fuchs
That was going to be my suggestion as well, except the zookeeper property is maxclientcnxns. Cheers, Adam On Aug 16, 2012 7:22 AM, "Jim Klucar" wrote: > Just shooting from the hip here. > > Zookeeper maxclientcxns in zoo.cfg should be increased from the default to > something like 100. Check the

  1   2   >