Re: reading rfiles directly

Bulldog20630405 Wed, 05 Aug 2020 17:41:08 -0700

thanx for your response... i thought that would be the best one to use;
spent my time yesterday trying to use the other one...
using the client i see there is a builder pattern and i was able to create
a simple test methods to verify it works:



*// this is just a simple example for anyone to get started*

*def scanRFile(filepath: String, user: String): Unit = {*

*  val rfileInputArgs: RFile.InputArguments = Rfile.newScanner*

*  val scanner: Scanner =
rfileInputArgs.from(filepath).withAuthorization(securityOps.getUserAuthorizations(user)).build*

*  scanner.iterator.asScala.foreach({ field => *
*    println(s"key => ${field.getKey}, value => ${field.getValue}")*

*  })}*

this is a great start and now i can work on integrating it into a
mapreduce/spark job
thanx again




On Wed, Aug 5, 2020 at 12:51 PM Christopher <ctubb...@apache.org> wrote:

> The one in core.client is newer, and intended as a stable public API.
> The other is internal API, more awkward to use, and not guaranteed to
> be stable. If the public API is missing something that would be
> useful, we can consider making a non-public method more visible, or
> adding new public methods to the API in core.client.
>
> On Wed, Aug 5, 2020 at 12:48 PM Bulldog20630405
> <bulldog20630...@gmail.com> wrote:
> >
> > i looked at the RFile class (there are two of them in core ... one is
> core.client.rfile.RFile and the other is core.file.rfile.RFile)
> >
> > in both cases most of the capability is private or package protected and
> you cannot access the functionality.
> >
> > am i missing something?
> >
> >
> >
> > On Tue, Aug 4, 2020 at 1:07 PM Bulldog20630405 <
> bulldog20630...@gmail.com> wrote:
> >>
> >> yes; that is more what i want to do; i wish there was an
> AccumuloFileInputFormat; but there isnt... maybe i need to create
> one...thanx... i will look into the rfile class (i am using 1.9 so we
> should be good)
> >>
> >> On Tue, Aug 4, 2020 at 12:20 PM Keith Turner <ke...@deenlo.com> wrote:
> >>>
> >>> Could the Accumulo Map Reduce input format and enable scanning an
> >>> offline table. This will read the tables rfiles directly excluding any
> >>> data falling outside of tablet boundaries.  Since this is a Hadoop
> >>> input format, it should work easily with Spark.  I can point to
> >>> examples of this if interested.
> >>>
> >>> Another option is using the RFile class (added in 1.8) in the public
> >>> API to directly read individual RFiles, this is useful when tables and
> >>> tablets are not a concern.  I have not used this with Spark, but I
> >>> think it would work easily by  partitioning a list of files into task
> >>> and having each task read a set of rfiles directly.
> >>>
> >>> On Mon, Aug 3, 2020 at 4:46 PM Bulldog20630405
> >>> <bulldog20630...@gmail.com> wrote:
> >>> >
> >>> >
> >>> > we would like to read rfiles directly outside an active accumulo
> instance using spark.  is there a example to do this?
> >>> >
> >>> > note: i know there is an utility to print rfiles and i could start
> there and build my own; but was hoping to leverage something already there.
>

Re: reading rfiles directly

Reply via email to