It's very straightforward to set up a Hadoop RDD to use
AccumuloInputFormat. Something like this will do the trick:

private JavaPairRDD<Key,Value> newAccumuloRDD(JavaSparkContext sc,
AgileConf agileConf, String appName, Authorizations auths)
        throws IOException, AccumuloSecurityException {
    Job hadoopJob = Job.getInstance(agileConf, appName);
    // configureAccumuloInput is exactly the same as for an MR job
    // sets zookeeper instance, credentials, table name, auths etc.
    configureAccumuloInput(hadoopJob, ACCUMULO_TABLE, auths);
    return sc.newAPIHadoopRDD(hadoopJob.getConfiguration(),
AccumuloInputFormat.class, Key.class, Value.class);
}

There's tons of docs around how to operate on a JavaPairRDD. But you're
right, there's hardly anything at all re. how to plug accumulo into spark.

-Russ

On Wed, Sep 10, 2014 at 1:17 PM, Megavolt <[email protected]> wrote:

> I've been doing some Googling and haven't found much info on how to
> incorporate Spark and Accumulo.  Does anyone know of some examples of how
> to
> tie Spark to Accumulo (for both fetching data and dumping results)?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Accumulo-and-Spark-tp13923.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to