cluster nodes is possible when
all of the worker nodes have the file in question in their local file system.
Still fairly new to Spark so bear with me if this is easily tunable by some
config params.
Bests,
-Monir
-Original Message-
From: Mozumder, Monir
Sent: Thursday,
I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file
) :
scala> val lines = sc.textFile("file:///home/monir/.bashrc")
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
:12
scala> val linecount = lines.count
org.apache.hadoop.mapred.InvalidInputEx
I have this 2-node cluster setup, where each node has 4-cores.
MASTER
(Worker-on-master) (Worker-on-node1)
(slaves(master,node1))
SPARK_WORKER_INSTANCES=1
I am trying to understand Spark's parallelize behavior. The sparkPi example has
this code:
val slices = 8
An on-list follow up: http://prof.ict.ac.cn/BigDataBench/#Benchmarks looks
promising as it has spark as one of the platforms used.
Bests,
-Monir
From: Mozumder, Monir
Sent: Monday, August 11, 2014 7:18 PM
To: user@spark.apache.org
Subject: Benchmark on physical Spark cluster
I am trying to
I am trying to get some workloads or benchmarks for running on a physical spark
cluster and find relative speedups on different physical clusters.
The instructions at
https://databricks.com/blog/2014/02/12/big-data-benchmark.html uses Amazon EC2.
I was wondering if anyone got other benchmarks f