Could you give more details about the mis-behavior of --jars for SparkR? maybe it's a bug. ________________________________ From: Michal Haris [michal.ha...@visualdna.com] Sent: Tuesday, July 14, 2015 5:31 PM To: Sun, Rui Cc: Michal Haris; user@spark.apache.org Subject: Re: Including additional scala libraries in sparkR
Ok thanks. It seems that --jars is not behaving as expected - getting class not found for even the most simple object from my lib. But anyways, I have to do at least a filter transformation before collecting the HBaseRDD into R so will have to go the route of using scala spark shell to transform and collect and save into local filesystem and the visualise the file with R until custom RDD transformations are exposed in the SparkR API. On 13 July 2015 at 10:27, Sun, Rui <rui....@intel.com<mailto:rui....@intel.com>> wrote: Hi, Michal, SparkR comes with a JVM backend that supports Java object instantiation, calling Java instance and static methods from R side. As defined in https://github.com/apache/spark/blob/master/R/pkg/R/backend.R, newJObject() is to create an instance of a Java class; callJMethod() is to call an instance method of a Java object; callJStatic() is to call a static method of a Java class. If the thing is as simple as data visualization, you can use the above low-level functions to create an instance of your HBASE RDD in JVM side, collect the data to R side, and visualize it. However, if you want to do HBASE RDD transformation and HBASE table update, things are quite complex now. SparkR supports majority of RDD API (though not exposed publicly in 1.4 release) allowing transformation functions in R code, but currently it only supports RDD source from text files and SparkR Data Frames, so your HBASE RDDs can't be used by SparkR RDD API for further processing. You can use --jars to include your scala library to be accessed by the JVM backend. ________________________________ From: Michal Haris [michal.ha...@visualdna.com<mailto:michal.ha...@visualdna.com>] Sent: Sunday, July 12, 2015 6:39 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Including additional scala libraries in sparkR I have spark program with a custom optimised rdd for hbase scans and updates. I have a small library of objects in scala to support efficient serialisation, partitioning etc. I would like to use R as an analysis and visualisation front-end. I have tried to use rJava (i.e. not using sparkR) and I got as far as initialising the spark context but I have encountered problems with hbase dependencies (HBaseConfiguration : Unsupported major.minor version 51.0) so tried sparkR but I can't figure out how to make my custom scala classes available to sparkR other than re-implementing them in R. Is there a way to include and invoke additional scala objects and RDDs within sparkR shell/job ? Something similar to additional jars and init script in normal spark submit/shell.. -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229<tel:%2B44%20%280%29%20207%20749%200229> www.visualdna.com<http://www.visualdna.com><http://www.visualdna.com> | t: +44 (0) 207 734 7033<tel:%2B44%20%280%29%20207%20734%207033> 31 Old Nichol Street London E2 7HR -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229 www.visualdna.com<http://www.visualdna.com> | t: +44 (0) 207 734 7033 31 Old Nichol Street London E2 7HR --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org