Accumulo Working

Yamini Joshi Tue, 22 Nov 2016 09:05:46 -0800

Hello all

I am trying to understand Accumulo scan workflow. I've checked the official
docs but I couldn't understand the workflow properly. Could anyone please
tell me if I'm on the right track? For example if I want to scan rows in
the range e-g in a table mytable which is sharded across 3 nodes in the
cluster:


Step1: Client connects to the Zookeeper and gets the location of the root
tablet.
Step2: Client connects to tserver with the root tablet and gets the
location of mytable.
the row distribution is as follows:
tserver1             tserver2                   tserver3
a-g                       h-k                            l-z

Step3: Client connects to tserver1 and tserver2.
Step4: tservers merge and sort data from in-memory maps, minc files and
majc files, apply versioning iterator, seek the requested range and send
data back to the client.

Is this how a scan works? Also, I have some doubts:
1. Where is the data from tserver1 and tserver2 merged?
2. when and how are custom iterators applied?


Also, if there is any resource explaining this, please point me to it. I've
found some slides but no detailed explanation.


Best regards,
Yamini Joshi

Accumulo Working

Reply via email to