Is it possible Running SparkR on 2 nodes without HDFS

Sanjay Subramanian Mon, 09 Nov 2015 18:07:10 -0800

hey guys
I have a 2 node SparkR (1 master 1 slave)cluster on AWS using 
spark-1.5.1-bin-without-hadoop.tgz
Running the SparkR job on the master node 
/opt/spark-1.5.1-bin-hadoop2.6/bin/sparkR --master  
spark://ip-xx-ppp-vv-ddd:7077 --packages com.databricks:spark-csv_2.10:1.2.0  
--executor-cores 16 --num-executors 8 --executor-memory 8G --driver-memory 8g   
myRprogram.R


  org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in 
stage 1.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1.0 (TID 
103, xx.ff.rr.tt): java.io.FileNotFoundException: File 
file:/mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv 
does not exist at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
 at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
 at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
 at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) 
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
 at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) 
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) at 
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecord




myRprogram.R
library(SparkR)
sc <- sparkR.init(appName="SparkR-CancerData-example")sqlContext <- 
sparkRSQL.init(sc)
lds <- read.df(sqlContext, 
"file:///mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv",
 "com.databricks.spark.csv", 
header="true")sink("file:///mnt/local/1024gbxvdf1/leads_new_data_analyis.txt")summary(lds)

This used to run when we had a single node SparkR installation
regards
sanjay

Is it possible Running SparkR on 2 nodes without HDFS

Reply via email to