Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Eric Newton
This use case is one of the things Accumulo was designed to handle well. It's the reason there is a BatchScanner. I've created: https://issues.apache.org/jira/browse/ACCUMULO-3813 so we can investigate and track down any problems or improvements. Feel free to add any other details to the JIRA

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Emilio Lahr-Vivaz
It sounds like each of your ranges is an ID, e.g. a single row. I've found that scanning lots of non-sequential single-row ranges is pretty slow in accumulo. Your best approach is probably to create an index table on whatever you are originally trying to query (assuming those 1 ids came

Re: Mini Accumulo cluster

2015-05-13 Thread Josh Elser
As long as you're managing your expectations (which I sounds like you've considered well), there could be some worth. A concern would be how using a different filesystem implementation actually impacts the validity of your benchmark though. e.g. w/ a local FS (which is by default what MAC

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Eric Newton
Yes, hot-spotting does affect accumulo because you have fewer servers and caches handling your request. Let's say your data is spread out, in a normal distribution from 0..9. What if you have only 1 split? You would want it at 5, to divide the data in half, and you could host the halves on

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread Eric Newton
Yes, that's a great way to split the data evenly. Also, since the data set is so small, turn on data caching for your table: shell config -t mytable -s table.cache.block.enable=true You may want to increase the size of your tserver JVM, and increase the size of the cache: shell config -s

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread vaibhav thapliyal
Thank you Eric. One thing I would like to know. Does pre-splitting the data play a part in querying accumulo? Because I managed to somewhat decrease the querying time. I did the following steps: My table was around 1.47gb so I explicity set the split parameter to 256mb instead of the default

Re: BatchScanner taking too much time to scan rows

2015-05-13 Thread vaibhav thapliyal
Thank you Eric. I will surely do the same. Should uneven distribution across the tablets affect querying in accumulo? If this case, it is. Is this behaviour normal? On 13-May-2015 10:58 pm, Eric Newton eric.new...@gmail.com wrote: Yes, that's a great way to split the data evenly. Also, since

Mini Accumulo cluster

2015-05-13 Thread Dave Hardcastle
Hi, Is it crazy to use a MiniAccumuloCluster to measure the *relative* performance of two different implementations of iterators? Obviously it would be better to do it on a real Accumulo cluster, but that's not possible for several reasons. The approach would be something like: - Fire up a Mini