Could the Accumulo Map Reduce input format and enable scanning an
offline table. This will read the tables rfiles directly excluding any
data falling outside of tablet boundaries.  Since this is a Hadoop
input format, it should work easily with Spark.  I can point to
examples of this if interested.

Another option is using the RFile class (added in 1.8) in the public
API to directly read individual RFiles, this is useful when tables and
tablets are not a concern.  I have not used this with Spark, but I
think it would work easily by  partitioning a list of files into task
and having each task read a set of rfiles directly.

On Mon, Aug 3, 2020 at 4:46 PM Bulldog20630405
<bulldog20630...@gmail.com> wrote:
>
>
> we would like to read rfiles directly outside an active accumulo instance 
> using spark.  is there a example to do this?
>
> note: i know there is an utility to print rfiles and i could start there and 
> build my own; but was hoping to leverage something already there.

Reply via email to