Unit/Integration tests with Mini Accumulo Cluster

2019-12-13 Thread Jim Hughes
Hi all, I work on GeoMesa and for the Accumulo 1.x line, we have been using the MockAccumulo infrastructure for our unit/integration tests which run in a Maven build.  In Accumulo 2.x, since MockAccumulo is gone, we're looking at using the MiniAccumulo cluster infrastructure. Are there best

Accumulo on S3

2020-03-03 Thread Jim Hughes
Hi all, The next major release of GeoMesa is aimed at supporting Accumulo 2.x.  As part of testing, my coworker Kevin and I are trying out Accumulo 2.0 on S3. Keith's blog post[1] is great.  As people have tested Accumulo 2.0 in AWS, has anyone tried using EMR for the underlying HDFS cluster

Re: reading rfiles directly

2020-08-03 Thread Jim Hughes
Good question.  As a very general note, one can leverage Hadoop InputFormats to create Spark RDDs. As a rather non-trivial example, you could check out GeoMesa's implementation of mapping Accumulo entries to geospatial data types. The basic strategy is make a Hadoop Configuration object repre

Accumulo on S3 tuning for write performance

2021-06-09 Thread Jim Hughes
Hi all, We are trying a large ingest using Accumulo on S3 and we are seeing some exceptions around writes to S3.  The blog post about Accumulo on S3[1] suggests setting "fs.s3a.connection.maximum" to 128.  Similar advice for HBase seems to suggest bumping that value even higher. Does anyone