Re: reading rfiles directly

2020-08-05 Thread Bulldog20630405
thanx for your response... i thought that would be the best one to use; spent my time yesterday trying to use the other one... using the client i see there is a builder pattern and i was able to create a simple test methods to verify it works: *// this is just a simple example for anyone to get s

Re: reading rfiles directly

2020-08-05 Thread Christopher
The one in core.client is newer, and intended as a stable public API. The other is internal API, more awkward to use, and not guaranteed to be stable. If the public API is missing something that would be useful, we can consider making a non-public method more visible, or adding new public methods t

Re: reading rfiles directly

2020-08-05 Thread Bulldog20630405
i looked at the RFile class (there are two of them in core ... one is core.client.rfile.RFile and the other is core.file.rfile.RFile) in both cases most of the capability is private or package protected and you cannot access the functionality. am i missing something? On Tue, Aug 4, 2020 at 1:0

Re: reading rfiles directly

2020-08-04 Thread Bulldog20630405
okay; thanx; the GeoMesaAccumuloInputFormat look interesting; i just need to make it more generic ... thanx! On Mon, Aug 3, 2020 at 5:38 PM Jim Hughes wrote: > Good question. As a very general note, one can leverage Hadoop > InputFormats to create Spark RDDs. > > As a rather non-trivial example

Re: reading rfiles directly

2020-08-04 Thread Bulldog20630405
yes; that is more what i want to do; i wish there was an AccumuloFileInputFormat; but there isnt... maybe i need to create one...thanx... i will look into the rfile class (i am using 1.9 so we should be good) On Tue, Aug 4, 2020 at 12:20 PM Keith Turner wrote: > Could the Accumulo Map Reduce inp

Re: reading rfiles directly

2020-08-04 Thread Bulldog20630405
thanx; i have already done that; it works... im am trying something that will work faster of many files in a directory. i just want to use the file directory read and parse the rfile directly (much like the print rfiles class does with the rfile reader; however, need to decouple it for external us

Re: reading rfiles directly

2020-08-04 Thread Keith Turner
Could the Accumulo Map Reduce input format and enable scanning an offline table. This will read the tables rfiles directly excluding any data falling outside of tablet boundaries. Since this is a Hadoop input format, it should work easily with Spark. I can point to examples of this if interested.

Re: reading rfiles directly

2020-08-03 Thread Jim Hughes
Good question.  As a very general note, one can leverage Hadoop InputFormats to create Spark RDDs. As a rather non-trivial example, you could check out GeoMesa's implementation of mapping Accumulo entries to geospatial data types. The basic strategy is make a Hadoop Configuration object repre

reading rfiles directly

2020-08-03 Thread Bulldog20630405
we would like to read rfiles directly outside an active accumulo instance using spark. is there a example to do this? note: i know there is an utility to print rfiles and i could start there and build my own; but was hoping to leverage something already there.