On Tue, Feb 11, 2014 at 11:10 AM, Kesten Broughton wrote:
> Hi there,
>
> We have been experimenting with accumulo for about two months now. Our
> biggest painpoint has been on ingest.
> Often we will have ingest process fail 2 or 3 times 3/4 of the way
> through an ingest and then on a final try
In your example, the row ID is "000c35b2-ee6c-339e-9e6a-65a9bccbfa2c". If
you are using one UUID
for all of the ingested data, then you'd be creating one large row and just
one tablet would
be ingesting the information.
If you are using more than one UUID in the row fields, are you
pre-splitting t
tserver.readahead.concurrent.max controls the size of a thread pool on the
tserver used to service scans and batch scans
On Tue, Feb 11, 2014 at 2:59 PM, Roshan Punnoose wrote:
> From the monitor, it seems like only N "Running Scans" are allowed at a
> time, and the rest are queued. Is this so
It depends. First off, there's the number of scanners and threads given
to batchscanners which control how many client "scan sessions" are
incoming to the cluster.
Some discussion on an earlier thread might help:
http://mail-archives.apache.org/mod_mbox/accumulo-user/201402.mbox/%3C52F542E5.90
When you add edges are you by chance creating one mutation and adding a lot
of edges to it? This could create a large mutation, which would have to
fit in JVM memory on the tserver (which looks like its 1g).
Accumulo logs messages about whats going on w/ the java GC every few
seconds. Try grep
>From the monitor, it seems like only N "Running Scans" are allowed at a
time, and the rest are queued. Is this something that is configurable?
Roshan
Hi david,
Responses inline
What is the average load on the servers while the ingest runs?
We are seeing ingest rates (ingest column on accumulo dashboard) of 200-400k.
Load is low, perhaps up to 1 on a 4 core vm. Less on bare-metal. Often we see
only one tablet server (of two) ingesting.
Jeremy,
Our ingest tests using BatchWriter into accumulo showed that none of cpu,
network or disk io were maxed.
We were seeing a cpu load of 1 out of 4 on a 4 core vm for example.
I will answer there hardware question in my response to Sean Busbey
From: "Kepner, Jeremy - 0553 - MITLL"
mailto:
Skip 00, nothing would come before it. :-)
On Tue, Feb 11, 2014 at 12:34 PM, Mike Drob wrote:
> For uuid4 keys, you might want to do [00, 01, 02, ..., 0e, 0f, 10, ...,
> fd, fe, ff] to cover the full range.
>
>
> On Tue, Feb 11, 2014 at 9:16 AM, Josh Elser wrote:
>
>> Ok. Even so, try adding s
For uuid4 keys, you might want to do [00, 01, 02, ..., 0e, 0f, 10, ..., fd,
fe, ff] to cover the full range.
On Tue, Feb 11, 2014 at 9:16 AM, Josh Elser wrote:
> Ok. Even so, try adding some split points to the tables before you begin
> (if you aren't already) as it will *greatly* smooth the st
Ok. Even so, try adding some split points to the tables before you begin
(if you aren't already) as it will *greatly* smooth the startup.
Something like [00, 01, 02, ... 10, 11, 12, .. 97, 98, 99] would be
good. You can easily dump this to a file on local disk and run the
`addsplits` command i
I'm using random keys for this tests. They are uuid4 keys.
On Tue, Feb 11, 2014 at 1:04 PM, Josh Elser wrote:
> The other thing I thought about.. what's the distribution of Key-Values
> that you're writing? Specifically, do many of the Keys sort "near" each
> other. Similarly, do you notice exces
My cluster ingests data every night. We use a map-reduce program to
generate rFiles. Then import those files into Accumulo. No hiccups. No
instability. I've also used map-reduce to directly write mutations. Haven't
seen any issues there either.
What is the average load on the servers while the ing
On 2/11/14, 11:10 AM, Kesten Broughton wrote:
Hi there,
We have been experimenting with accumulo for about two months now. Our
biggest painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way
through an ingest and then on a final try it works, without any
Hi Kesten!
Could you tell us:
1) Accumulo version
2) HDFS + ZooKeeper versions
3) are you using the BatchWriter API, or bulk ingest?
4) what does your table design look like?
5) what does your source data look like?
6) what kind of hardware is on these 3 nodes? Memory, disks, CPU cores.
7)
Hi there,
We have been experimenting with accumulo for about two months now. Our biggest
painpoint has been on ingest.
Often we will have ingest process fail 2 or 3 times 3/4 of the way through an
ingest and then on a final try it works, without any changes.
Once the ingest works, the cluster
The other thing I thought about.. what's the distribution of Key-Values
that you're writing? Specifically, do many of the Keys sort "near" each
other. Similarly, do you notice excessive load on some tservers, but not
all (the "Tablet Servers" page on the Monitor is a good check)?
Consider the
Same results with 2G tserver.memory.maps.max.
May be we just reached the limit :)
On Mon, Feb 10, 2014 at 7:08 PM, Diego Woitasen
wrote:
> On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser wrote:
>> I assume you're running a datanode along side the tserver on that node? That
>> may be stretching the
18 matches
Mail list logo