Re: BatchScanner taking too much time to scan rows

2015-05-14 Thread vaibhav thapliyal
Dylan could you elaborate on the average query time you had? Thanks Vaibhav On 14-May-2015 11:03 pm, Dylan Hutchison dhutc...@mit.edu wrote: I think this is the same issue I found for ACCUMULO-3710 https://issues.apache.org/jira/browse/ACCUMULO-3710, only in my case the tserver ran out of

Re: BatchScanner taking too much time to scan rows

2015-05-14 Thread Dylan Hutchison
Sorry, just remembered that my setup was to scan an index table and gather rowIDs, then scan a main data table using the rowIDs as the BatchScan ranges. Effectively it is a join of part of the index table to a main data table. The scan rate I achieved is therefore double the value I cited

Re: BatchScanner taking too much time to scan rows

2015-05-14 Thread Dylan Hutchison
I didn't have an average query time-- the tablet server crashed. A quick solution is to batch the ranges into groups of 50k (or 500k, I forgot which one) and do many BatchScans-- not ideal. I think I achieved 33k entries/second retrieval on a single-node Accumulo. Accumulo is better for

Re: BatchScanner taking too much time to scan rows

2015-05-14 Thread Dylan Hutchison
I think this is the same issue I found for ACCUMULO-3710 https://issues.apache.org/jira/browse/ACCUMULO-3710, only in my case the tserver ran out of memory. Accumulo doesn't handle large numbers of small, disjoint ranges well. I bet there's room for improvement on both the client and tablet

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Eric Newton
This use case is one of the things Accumulo was designed to handle well. It's the reason there is a BatchScanner. I've created: https://issues.apache.org/jira/browse/ACCUMULO-3813 so we can investigate and track down any problems or improvements. Feel free to add any other details to the JIRA

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Emilio Lahr-Vivaz
It sounds like each of your ranges is an ID, e.g. a single row. I've found that scanning lots of non-sequential single-row ranges is pretty slow in accumulo. Your best approach is probably to create an index table on whatever you are originally trying to query (assuming those 1 ids came

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Eric Newton
Yes, hot-spotting does affect accumulo because you have fewer servers and caches handling your request. Let's say your data is spread out, in a normal distribution from 0..9. What if you have only 1 split? You would want it at 5, to divide the data in half, and you could host the halves on

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Eric Newton
Yes, that's a great way to split the data evenly. Also, since the data set is so small, turn on data caching for your table: shell config -t mytable -s table.cache.block.enable=true You may want to increase the size of your tserver JVM, and increase the size of the cache: shell config -s

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread vaibhav thapliyal
Thank you Eric. One thing I would like to know. Does pre-splitting the data play a part in querying accumulo? Because I managed to somewhat decrease the querying time. I did the following steps: My table was around 1.47gb so I explicity set the split parameter to 256mb instead of the default

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread vaibhav thapliyal
Thank you Eric. I will surely do the same. Should uneven distribution across the tablets affect querying in accumulo? If this case, it is. Is this behaviour normal? On 13-May-2015 10:58 pm, Eric Newton eric.new...@gmail.com wrote: Yes, that's a great way to split the data evenly. Also, since

Re: BatchScanner taking too much time to scan rows

2015-05-12 Thread Keith Turner
Do you know how much data is being brought back (i.e. 100 megabytes)? I am wondering what the data rate is in MB/s. Do you know how many files per tablet you have? Do most of the 10,000 ids you are querying for exist? On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal

Re: BatchScanner taking too much time to scan rows

2015-05-12 Thread Keith Turner
How many tablets do you have? The batch scanner does not parallelize operations within a tablet. If you give the batch scanner more threads than there are tservers, it will make multilple parallel rpc calls to each tserver if the tserver has multiple tablets. Each rpc may include multiple

BatchScanner taking too much time to scan rows

2015-05-12 Thread vaibhav thapliyal
Hi, I am using BatchScanner to scan rows from a accumulo table. The table has around 187m entries and I am using a 3 node cluster which has accumulo 1.6.1. I have passed 1 ids which are stored as row id in my table as a list in the setRanges() method. This whole process takes around 50

Re: BatchScanner taking too much time to scan rows

2015-05-12 Thread David Medinets
On the monitor page, you should see how many threads are running in each tserver, if I remember correctly. There are also graphs to show response rates. On Tue, May 12, 2015 at 2:39 PM, vaibhav thapliyal vaibhav.thapliyal...@gmail.com wrote: I also tried to increase threads to a bigger number

Re: BatchScanner taking too much time to scan rows

2015-05-12 Thread vaibhav thapliyal
I also tried to increase threads to a bigger number about 500, but yes I will try using batchscanner with 194 threads too. I will get back with the info that Keith has asked in some time. Thanks Vaibhav On 13-May-2015 12:04 am, David Medinets david.medin...@gmail.com wrote: Try using 194

Re: BatchScanner taking too much time to scan rows

2015-05-12 Thread David Medinets
Try using 194 threads if your hardware can support them. The worst that'll happen is the client program crashes during testing. If that happens, cut the number of threads in half. And so on. On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal vaibhav.thapliyal...@gmail.com wrote: I have 194

Re: BatchScanner taking too much time to scan rows

2015-05-12 Thread vaibhav thapliyal
I have 194 tablets. Currently I am using 20 threads to create the batchscanner inside the createBatchScanner method. On 12-May-2015 11:19 pm, Keith Turner ke...@deenlo.com wrote: How many tablets do you have? The batch scanner does not parallelize operations within a tablet. If you give the