thanx for your response... i thought that would be the best one to use;
spent my time yesterday trying to use the other one...
using the client i see there is a builder pattern and i was able to create
a simple test methods to verify it works:
*// this is just a simple example for anyone to get s
The one in core.client is newer, and intended as a stable public API.
The other is internal API, more awkward to use, and not guaranteed to
be stable. If the public API is missing something that would be
useful, we can consider making a non-public method more visible, or
adding new public methods t
i looked at the RFile class (there are two of them in core ... one is
core.client.rfile.RFile and the other is core.file.rfile.RFile)
in both cases most of the capability is private or package protected and
you cannot access the functionality.
am i missing something?
On Tue, Aug 4, 2020 at 1:0
okay; thanx; the GeoMesaAccumuloInputFormat look interesting; i just need
to make it more generic ... thanx!
On Mon, Aug 3, 2020 at 5:38 PM Jim Hughes wrote:
> Good question. As a very general note, one can leverage Hadoop
> InputFormats to create Spark RDDs.
>
> As a rather non-trivial example
yes; that is more what i want to do; i wish there was an
AccumuloFileInputFormat; but there isnt... maybe i need to create
one...thanx... i will look into the rfile class (i am using 1.9 so we
should be good)
On Tue, Aug 4, 2020 at 12:20 PM Keith Turner wrote:
> Could the Accumulo Map Reduce inp
thanx; i have already done that; it works... im am trying something that
will work faster of many files in a directory. i just want to use the file
directory read and parse the rfile directly (much like the print rfiles
class does with the rfile reader; however, need to decouple it for external
us
Could the Accumulo Map Reduce input format and enable scanning an
offline table. This will read the tables rfiles directly excluding any
data falling outside of tablet boundaries. Since this is a Hadoop
input format, it should work easily with Spark. I can point to
examples of this if interested.
Good question. As a very general note, one can leverage Hadoop
InputFormats to create Spark RDDs.
As a rather non-trivial example, you could check out GeoMesa's
implementation of mapping Accumulo entries to geospatial data types.
The basic strategy is make a Hadoop Configuration object repre
we would like to read rfiles directly outside an active accumulo instance
using spark. is there a example to do this?
note: i know there is an utility to print rfiles and i could start there
and build my own; but was hoping to leverage something already there.