It's very straightforward to set up a Hadoop RDD to use
AccumuloInputFormat. Something like this will do the trick:
private JavaPairRDD<Key,Value> newAccumuloRDD(JavaSparkContext sc,
AgileConf agileConf, String appName, Authorizations auths)
throws IOException, AccumuloSecurityException {
Job hadoopJob = Job.getInstance(agileConf, appName);
// configureAccumuloInput is exactly the same as for an MR job
// sets zookeeper instance, credentials, table name, auths etc.
configureAccumuloInput(hadoopJob, ACCUMULO_TABLE, auths);
return sc.newAPIHadoopRDD(hadoopJob.getConfiguration(),
AccumuloInputFormat.class, Key.class, Value.class);
}
There's tons of docs around how to operate on a JavaPairRDD. But you're
right, there's hardly anything at all re. how to plug accumulo into spark.
-Russ
On Wed, Sep 10, 2014 at 1:17 PM, Megavolt <[email protected]> wrote:
> I've been doing some Googling and haven't found much info on how to
> incorporate Spark and Accumulo. Does anyone know of some examples of how
> to
> tie Spark to Accumulo (for both fetching data and dumping results)?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Accumulo-and-Spark-tp13923.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>