Dylan could you elaborate on the average query time you had?
Thanks
Vaibhav
On 14-May-2015 11:03 pm, Dylan Hutchison dhutc...@mit.edu wrote:
I think this is the same issue I found for ACCUMULO-3710
https://issues.apache.org/jira/browse/ACCUMULO-3710, only in my case
the tserver ran out of
Sorry, just remembered that my setup was to scan an index table and gather
rowIDs, then scan a main data table using the rowIDs as the BatchScan
ranges. Effectively it is a join of part of the index table to a main data
table.
The scan rate I achieved is therefore double the value I cited
I didn't have an average query time-- the tablet server crashed. A quick
solution is to batch the ranges into groups of 50k (or 500k, I forgot which
one) and do many BatchScans-- not ideal. I think I achieved 33k
entries/second retrieval on a single-node Accumulo. Accumulo is better for
I think this is the same issue I found for ACCUMULO-3710
https://issues.apache.org/jira/browse/ACCUMULO-3710, only in my case the
tserver ran out of memory. Accumulo doesn't handle large numbers of small,
disjoint ranges well. I bet there's room for improvement on both the
client and tablet
This use case is one of the things Accumulo was designed to handle well.
It's the reason there is a BatchScanner.
I've created:
https://issues.apache.org/jira/browse/ACCUMULO-3813
so we can investigate and track down any problems or improvements.
Feel free to add any other details to the JIRA
It sounds like each of your ranges is an ID, e.g. a single row. I've
found that scanning lots of non-sequential single-row ranges is pretty
slow in accumulo. Your best approach is probably to create an index
table on whatever you are originally trying to query (assuming those
1 ids came
Yes, hot-spotting does affect accumulo because you have fewer servers and
caches handling your request.
Let's say your data is spread out, in a normal distribution from 0..9.
What if you have only 1 split? You would want it at 5, to divide the
data in half, and you could host the halves on
Yes, that's a great way to split the data evenly.
Also, since the data set is so small, turn on data caching for your table:
shell config -t mytable -s table.cache.block.enable=true
You may want to increase the size of your tserver JVM, and increase the
size of the cache:
shell config -s
Thank you Eric.
One thing I would like to know. Does pre-splitting the data play a part in
querying accumulo?
Because I managed to somewhat decrease the querying time.
I did the following steps:
My table was around 1.47gb so I explicity set the split parameter to 256mb
instead of the default
Thank you Eric. I will surely do the same. Should uneven distribution
across the tablets affect querying in accumulo? If this case, it is. Is
this behaviour normal?
On 13-May-2015 10:58 pm, Eric Newton eric.new...@gmail.com wrote:
Yes, that's a great way to split the data evenly.
Also, since
Do you know how much data is being brought back (i.e. 100 megabytes)? I am
wondering what the data rate is in MB/s. Do you know how many files per
tablet you have? Do most of the 10,000 ids you are querying for exist?
On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal
How many tablets do you have? The batch scanner does not parallelize
operations within a tablet.
If you give the batch scanner more threads than there are tservers, it will
make multilple parallel rpc calls to each tserver if the tserver has
multiple tablets. Each rpc may include multiple
Hi,
I am using BatchScanner to scan rows from a accumulo table. The table has
around 187m entries and I am using a 3 node cluster which has accumulo
1.6.1.
I have passed 1 ids which are stored as row id in my table as a list in
the setRanges() method.
This whole process takes around 50
On the monitor page, you should see how many threads are running in
each tserver, if I remember correctly. There are also graphs to show
response rates.
On Tue, May 12, 2015 at 2:39 PM, vaibhav thapliyal
vaibhav.thapliyal...@gmail.com wrote:
I also tried to increase threads to a bigger number
I also tried to increase threads to a bigger number about 500, but yes I
will try using batchscanner with 194 threads too. I will get back with the
info that Keith has asked in some time.
Thanks
Vaibhav
On 13-May-2015 12:04 am, David Medinets david.medin...@gmail.com wrote:
Try using 194
Try using 194 threads if your hardware can support them. The worst
that'll happen is the client program crashes during testing. If that
happens, cut the number of threads in half. And so on.
On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal
vaibhav.thapliyal...@gmail.com wrote:
I have 194
I have 194 tablets. Currently I am using 20 threads to create the
batchscanner inside the createBatchScanner method.
On 12-May-2015 11:19 pm, Keith Turner ke...@deenlo.com wrote:
How many tablets do you have? The batch scanner does not parallelize
operations within a tablet.
If you give the
17 matches
Mail list logo