That's basically how it works, yes.

1. The data from tserver1 and tserver2 necessarily comes from at least two
different tablets. This is because tables are divided into discrete,
non-overlapping tablets, and each tablet is hosted only on a single
tserver. So, it is not normally necessary to merge the data from these two
sources. Your application may do a join between the two tablets on the
client side, but that is outside the scope of Accumulo.

2. Custom iterators can be applied to minc, majc, and scan scopes. I
suggest starting here:
https://accumulo.apache.org/1.8/accumulo_user_manual.html#_iterators


On Tue, Nov 22, 2016 at 12:05 PM Yamini Joshi <yamini.1...@gmail.com> wrote:

> Hello all
>
> I am trying to understand Accumulo scan workflow. I've checked the
> official docs but I couldn't understand the workflow properly. Could anyone
> please tell me if I'm on the right track? For example if I want to scan
> rows in the range e-g in a table mytable which is sharded across 3 nodes in
> the cluster:
>
> Step1: Client connects to the Zookeeper and gets the location of the root
> tablet.
> Step2: Client connects to tserver with the root tablet and gets the
> location of mytable.
> the row distribution is as follows:
> tserver1             tserver2                   tserver3
> a-g                       h-k                            l-z
>
> Step3: Client connects to tserver1 and tserver2.
> Step4: tservers merge and sort data from in-memory maps, minc files and
> majc files, apply versioning iterator, seek the requested range and send
> data back to the client.
>
> Is this how a scan works? Also, I have some doubts:
> 1. Where is the data from tserver1 and tserver2 merged?
> 2. when and how are custom iterators applied?
>
>
> Also, if there is any resource explaining this, please point me to it.
> I've found some slides but no detailed explanation.
>
>
> Best regards,
> Yamini Joshi
>

Reply via email to