Try using 194 threads if your hardware can support them. The worst
that'll happen is the client program crashes during testing. If that
happens, cut the number of threads in half. And so on.

On Tue, May 12, 2015 at 1:58 PM, vaibhav thapliyal
<vaibhav.thapliyal...@gmail.com> wrote:
> I have 194 tablets. Currently I am using 20 threads to create the
> batchscanner inside the createBatchScanner method.
>
> On 12-May-2015 11:19 pm, "Keith Turner" <ke...@deenlo.com> wrote:
>>
>> How many tablets do you have?  The batch scanner does not parallelize
>> operations within a tablet.
>>
>> If you give the batch scanner more threads than there are tservers, it
>> will make multilple parallel rpc calls to each tserver if the tserver has
>> multiple tablets.  Each rpc may include multiple tablets and ranges for each
>> tablet.
>>
>> If the batch scanner has less threads than tservers, it will make one rpc
>> per tserver per thread.  Each rpc call will include all tablets and
>> associated ranges for that tserver.
>>
>> Keith
>>
>>
>>
>> On Tue, May 12, 2015 at 1:39 PM, vaibhav thapliyal
>> <vaibhav.thapliyal...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I am using BatchScanner to scan rows from a accumulo table. The table has
>>> around 187m entries and I am using a 3 node cluster which has accumulo
>>> 1.6.1.
>>>
>>> I have passed 10000 ids which are stored as row id in my table as a list
>>> in the setRanges() method.
>>>
>>> This whole process takes around 50 secs(from adding the ids in the list
>>> to scanning the whole table using the BatchScanner).
>>>
>>> I tried switching on bloom filters but that didn't work.
>>>
>>> Also if anyone could briefly explain how a BatchScanner works, how it
>>> does parallel scanning it would help me understand what I am doing better.
>>>
>>> Thanks
>>> Vaibhav
>>>
>>>
>>
>

Reply via email to