Re: Including additional scala libraries in sparkR

Michal Haris Tue, 14 Jul 2015 02:33:13 -0700

Ok thanks. It seems that --jars is not behaving as expected - getting class
not found for even the most simple object from my lib. But anyways, I have
to do at least a filter transformation before collecting the HBaseRDD into
R so will have to go the route of using scala spark shell to transform and
collect and save into local filesystem and the visualise the file with R
until custom RDD transformations are exposed in the SparkR API.


On 13 July 2015 at 10:27, Sun, Rui <rui....@intel.com> wrote:

> Hi, Michal,
>
> SparkR comes with a JVM backend that supports Java object instantiation,
> calling Java instance and static methods from R side. As defined in
> https://github.com/apache/spark/blob/master/R/pkg/R/backend.R,
> newJObject() is to create an instance of a Java class;
> callJMethod() is to call an instance method of a Java object;
> callJStatic() is to call a static method of a Java class.
>
> If the thing is as simple as data visualization, you can use the above
> low-level functions to create an instance of your HBASE RDD in JVM side,
> collect the data to R side, and visualize it.
>
> However, if you want to do HBASE RDD transformation and HBASE table
> update, things are quite complex now. SparkR supports majority of RDD API
> (though not exposed publicly in 1.4 release) allowing transformation
> functions in R code, but currently it only supports RDD source from text
> files and SparkR Data Frames, so your HBASE RDDs can't be used by SparkR
> RDD API for further processing.
>
> You can use --jars to include your scala library to be accessed by the JVM
> backend.
>
> ________________________________
> From: Michal Haris [michal.ha...@visualdna.com]
> Sent: Sunday, July 12, 2015 6:39 PM
> To: user@spark.apache.org
> Subject: Including additional scala libraries in sparkR
>
> I have spark program with a custom optimised rdd for hbase scans and
> updates. I have a small library of objects in scala to support efficient
> serialisation, partitioning etc. I would like to use R as an analysis and
> visualisation front-end. I have tried to use rJava (i.e. not using sparkR)
> and I got as far as initialising the spark context but I have encountered
> problems with hbase dependencies (HBaseConfiguration : Unsupported
> major.minor version 51.0) so tried sparkR but I can't figure out how to
> make my custom scala classes available to sparkR other than re-implementing
> them in R. Is there a way to include and invoke additional scala objects
> and RDDs within sparkR shell/job ? Something similar to additional jars and
> init script in normal spark submit/shell..
>
> --
> Michal Haris
> Technical Architect
> direct line: +44 (0) 207 749 0229
> www.visualdna.com<http://www.visualdna.com> | t: +44 (0) 207 734 7033
> 31 Old Nichol Street
> London
> E2 7HR
>



-- 
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.com | t: +44 (0) 207 734 7033
31 Old Nichol Street
London
E2 7HR

Re: Including additional scala libraries in sparkR

Reply via email to