That's basically how it works, yes. 1. The data from tserver1 and tserver2 necessarily comes from at least two different tablets. This is because tables are divided into discrete, non-overlapping tablets, and each tablet is hosted only on a single tserver. So, it is not normally necessary to merge the data from these two sources. Your application may do a join between the two tablets on the client side, but that is outside the scope of Accumulo.
2. Custom iterators can be applied to minc, majc, and scan scopes. I suggest starting here: https://accumulo.apache.org/1.8/accumulo_user_manual.html#_iterators On Tue, Nov 22, 2016 at 12:05 PM Yamini Joshi <yamini.1...@gmail.com> wrote: > Hello all > > I am trying to understand Accumulo scan workflow. I've checked the > official docs but I couldn't understand the workflow properly. Could anyone > please tell me if I'm on the right track? For example if I want to scan > rows in the range e-g in a table mytable which is sharded across 3 nodes in > the cluster: > > Step1: Client connects to the Zookeeper and gets the location of the root > tablet. > Step2: Client connects to tserver with the root tablet and gets the > location of mytable. > the row distribution is as follows: > tserver1 tserver2 tserver3 > a-g h-k l-z > > Step3: Client connects to tserver1 and tserver2. > Step4: tservers merge and sort data from in-memory maps, minc files and > majc files, apply versioning iterator, seek the requested range and send > data back to the client. > > Is this how a scan works? Also, I have some doubts: > 1. Where is the data from tserver1 and tserver2 merged? > 2. when and how are custom iterators applied? > > > Also, if there is any resource explaining this, please point me to it. > I've found some slides but no detailed explanation. > > > Best regards, > Yamini Joshi >