Re: Q: BatchScanner and parallel (i.e. m/r style) execution

Christopher Sat, 16 Jan 2021 10:39:22 -0800

A BatchScanner takes multiple ranges, groups them by TServer, and then
queries TServers in parallel for the ranges that are located in each,
returning data in its iterator as it comes back (without regard to order).


If you run the same scan on multiple nodes, the task won't be
sub-divided in any way... it will just be multiple nodes querying for the
same thing. If you want, you can sub-divide your ranges in your client
code, distribute those ranges to different nodes, and have each node scan
only its designated range. You probably wouldn't use a BatchScanner for
that. A regular Scanner would suffice. This is how AccumuloInputFormat
works, implemented for both Hadoop's "mapred" and "mapreduce" APIs.

See more in the Javadocs:

https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/BatchScanner.html
https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/hadoop/mapred/AccumuloInputFormat.html
https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/hadoop/mapreduce/AccumuloInputFormat.html
https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html
https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.html



On Sat, Jan 16, 2021 at 11:28 AM Roberts, Geoffry [USA] <
[email protected]> wrote:

> All,
>
>
>
> Three questions all asking the same thing:
>
>
>
> Can an Accumulo scan or batchscan run like a map/reduce job?
>
>
>
> I have an Accumulo 2.0 cluster.
>
>
>
> In hadoop, I can launch a map/reduce job on the name node and hadoop
> distributes the job over the nodes of the cluster and the job runs in
> parallel.
>
>
>
> In accumulo, I am calling the batch scanner from some non-java code that
> is first distributed across the cluster then on each node it attaches to
> accumulo and does the scan.  It works on a single node accumulo—so far so
> good.  I need to escalate and run it multi-node.  I am concerned that I’ll
> wind up running the same scan on each node, which would return me an array
> of result sets all alike.  Am I correct?
>
>
>
> Can I somehow get the Hadoop m/r effect in accumulo?
>
>
>
> Thanks
>
>
>
> Geoffry Roberts
>
> Lead Technologist
>
> 702.290.9098
>
> [email protected]
>
>
>
> Booz | Allen | Hamilton
>
> BoozAllen.com
>

Re: Q: BatchScanner and parallel (i.e. m/r style) execution

Reply via email to