yes; that is more what i want to do; i wish there was an
AccumuloFileInputFormat; but there isnt... maybe i need to create
one...thanx... i will look into the rfile class (i am using 1.9 so we
should be good)

On Tue, Aug 4, 2020 at 12:20 PM Keith Turner <ke...@deenlo.com> wrote:

> Could the Accumulo Map Reduce input format and enable scanning an
> offline table. This will read the tables rfiles directly excluding any
> data falling outside of tablet boundaries.  Since this is a Hadoop
> input format, it should work easily with Spark.  I can point to
> examples of this if interested.
>
> Another option is using the RFile class (added in 1.8) in the public
> API to directly read individual RFiles, this is useful when tables and
> tablets are not a concern.  I have not used this with Spark, but I
> think it would work easily by  partitioning a list of files into task
> and having each task read a set of rfiles directly.
>
> On Mon, Aug 3, 2020 at 4:46 PM Bulldog20630405
> <bulldog20630...@gmail.com> wrote:
> >
> >
> > we would like to read rfiles directly outside an active accumulo
> instance using spark.  is there a example to do this?
> >
> > note: i know there is an utility to print rfiles and i could start there
> and build my own; but was hoping to leverage something already there.
>

Reply via email to