RE: Share RDD from SparkR and another application

2015-07-14 Thread Sun, Rui
Hi, hari, I don't think job-server can work with SparkR (also pySpark). It seems it would be technically possible but needs support from job-server and SparkR(also pySpark), which doesn't exist yet. But there may be some in-direct ways of sharing RDDs between SparkR and an application. For

RE: Including additional scala libraries in sparkR

2015-07-14 Thread Sun, Rui
Could you give more details about the mis-behavior of --jars for SparkR? maybe it's a bug. From: Michal Haris [michal.ha...@visualdna.com] Sent: Tuesday, July 14, 2015 5:31 PM To: Sun, Rui Cc: Michal Haris; user@spark.apache.org Subject: Re: Including additional

RE: Including additional scala libraries in sparkR

2015-07-13 Thread Sun, Rui
Hi, Michal, SparkR comes with a JVM backend that supports Java object instantiation, calling Java instance and static methods from R side. As defined in https://github.com/apache/spark/blob/master/R/pkg/R/backend.R, newJObject() is to create an instance of a Java class; callJMethod() is to call

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-13 Thread Sun, Rui
Hi, Kachau, If you are using SparkR with RStudio, have you followed the guidelines in the section Using SparkR from RStudio in https://github.com/apache/spark/tree/master/R ? From: kachau [umesh.ka...@gmail.com] Sent: Saturday, July 11, 2015 12:30 AM

RE: SparkR dataFrame read.df fails to read from aws s3

2015-07-09 Thread Sun, Rui
Hi, Ben 1) I guess this may be a JDK version mismatch. Could you check the JDK version? 2) I believe this is a bug in SparkR. I will fire a JIRA issue for it. From: Ben Spark [mailto:ben_spar...@yahoo.com.au] Sent: Thursday, July 9, 2015 12:14 PM To: user Subject: SparkR dataFrame

RE: [SparkR] Float type coercion with hiveContext

2015-07-08 Thread Sun, Rui
Hi, Evgeny, I reported a JIRA issue for your problem: https://issues.apache.org/jira/browse/SPARK-8897. You can track it to see how it will be solved. Ray -Original Message- From: Evgeny Sinelnikov [mailto:esinelni...@griddynamics.com] Sent: Monday, July 6, 2015 7:27 PM To:

RE: Error in creating spark RDD

2015-04-23 Thread Sun, Rui
Hi, SparkContext.newAPIHadoopRDD() is for working with new Hadoop mapreduce API. So, you should import import org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat; Instead of import org.apache.accumulo.core.client.mapred.AccumuloInputFormat; -Original Message- From: madhvi

RE: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

2014-12-18 Thread Sun, Rui
in such case? -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, December 18, 2014 5:23 PM To: Sun, Rui Cc: shiva...@eecs.berkeley.edu; user@spark.apache.org Subject: Re: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark

weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

2014-12-17 Thread Sun, Rui
Hi, I encountered a weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary. Steps to reproduce: 1. Download the official pre-built Spark binary 1.1.1 at http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz 2. Launch

RE: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

2014-12-17 Thread Sun, Rui
? -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, December 17, 2014 8:39 PM To: Sun, Rui Cc: user@spark.apache.org Subject: Re: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary You should use

RE: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

2014-12-17 Thread Sun, Rui
...@eecs.berkeley.edu] Sent: Thursday, December 18, 2014 2:20 AM To: Sean Owen Cc: Sun, Rui; user@spark.apache.org Subject: Re: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary Just to clarify, are you running the application using spark-submit

RE: Control default partition when load a RDD from HDFS

2014-12-16 Thread Sun, Rui
Hi, Shuai, How did you turn off the file split in Hadoop? I guess you might have implemented a customized FileInputFormat which overrides isSplitable() to return FALSE. If you do have such FileInputFormat, you can simply pass it as a constructor parameter to HadoopRDD or NewHadoopRDD in Spark.

RE: pyspark sc.textFile uses only 4 out of 32 threads per node

2014-12-16 Thread Sun, Rui
Gautham, How many number of gz files do you have? Maybe the reason is that gz file is compressed that can't be splitted for processing by Mapreduce. A single gz file can only be processed by a single Mapper so that the CPU treads can't be fully utilized. -Original Message- From:

<    1   2