Re: [External] Re: Q: BatchScanner and parallel (i.e. m/r style) execution

Christopher Sat, 16 Jan 2021 10:57:40 -0800

Not to all servers, just to those hosting data in that range. But
otherwise, yes.


On Sat, Jan 16, 2021 at 1:45 PM Roberts, Geoffry [USA] <
[email protected]> wrote:

> If I have a batch scanner that has one large range, and this range spans
> several tservers, accumulo will distribute it to all tservers, it will
> process in parallel; and I’ll get back as single result set?
>
>
>
> Geoffry Roberts
>
> Lead Technologist
>
> 702.290.9098
>
> [email protected]
>
>
>
> Booz | Allen | Hamilton
>
> BoozAllen.com
>
>
>
> *From: *Christopher <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Saturday, January 16, 2021 at 1:39 PM
> *To: *accumulo-user <[email protected]>
> *Subject: *[External] Re: Q: BatchScanner and parallel (i.e. m/r style)
> execution
>
>
>
> A BatchScanner takes multiple ranges, groups them by TServer, and then
> queries TServers in parallel for the ranges that are located in each,
> returning data in its iterator as it comes back (without regard to order).
>
> If you run the same scan on multiple nodes, the task won't be
> sub-divided in any way... it will just be multiple nodes querying for the
> same thing. If you want, you can sub-divide your ranges in your client
> code, distribute those ranges to different nodes, and have each node scan
> only its designated range. You probably wouldn't use a BatchScanner for
> that. A regular Scanner would suffice. This is how AccumuloInputFormat
> works, implemented for both Hadoop's "mapred" and "mapreduce" APIs.
>
> See more in the Javadocs:
>
>
> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/BatchScanner.html
> <https://urldefense.com/v3/__https:/accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/BatchScanner.html__;!!May37g!fqBeauPtN3_oHvKMPOMc0SZqu-HJeDmjGh2YtRLczKWTA-nmkOFDb3OMPBavgFFpgPRZ$>
>
>
> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/hadoop/mapred/AccumuloInputFormat.html
> <https://urldefense.com/v3/__https:/accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/hadoop/mapred/AccumuloInputFormat.html__;!!May37g!fqBeauPtN3_oHvKMPOMc0SZqu-HJeDmjGh2YtRLczKWTA-nmkOFDb3OMPBavgIc60MEe$>
>
>
> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/hadoop/mapreduce/AccumuloInputFormat.html
> <https://urldefense.com/v3/__https:/accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/hadoop/mapreduce/AccumuloInputFormat.html__;!!May37g!fqBeauPtN3_oHvKMPOMc0SZqu-HJeDmjGh2YtRLczKWTA-nmkOFDb3OMPBavgCVflycP$>
>
>
> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html
> <https://urldefense.com/v3/__https:/accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html__;!!May37g!fqBeauPtN3_oHvKMPOMc0SZqu-HJeDmjGh2YtRLczKWTA-nmkOFDb3OMPBavgA-Po3AT$>
>
>
> https://accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.html
> <https://urldefense.com/v3/__https:/accumulo.apache.org/docs/2.x/apidocs/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.html__;!!May37g!fqBeauPtN3_oHvKMPOMc0SZqu-HJeDmjGh2YtRLczKWTA-nmkOFDb3OMPBavgJRPi2h0$>
>
>
>
>
>
>
>
> On Sat, Jan 16, 2021 at 11:28 AM Roberts, Geoffry [USA] <
> [email protected]> wrote:
>
> All,
>
>
>
> Three questions all asking the same thing:
>
>
>
> Can an Accumulo scan or batchscan run like a map/reduce job?
>
>
>
> I have an Accumulo 2.0 cluster.
>
>
>
> In hadoop, I can launch a map/reduce job on the name node and hadoop
> distributes the job over the nodes of the cluster and the job runs in
> parallel.
>
>
>
> In accumulo, I am calling the batch scanner from some non-java code that
> is first distributed across the cluster then on each node it attaches to
> accumulo and does the scan.  It works on a single node accumulo—so far so
> good.  I need to escalate and run it multi-node.  I am concerned that I’ll
> wind up running the same scan on each node, which would return me an array
> of result sets all alike.  Am I correct?
>
>
>
> Can I somehow get the Hadoop m/r effect in accumulo?
>
>
>
> Thanks
>
>
>
> Geoffry Roberts
>
> Lead Technologist
>
> 702.290.9098
>
> [email protected]
>
>
>
> Booz | Allen | Hamilton
>
> BoozAllen.com
>
>

Re: [External] Re: Q: BatchScanner and parallel (i.e. m/r style) execution

Reply via email to