RE: cannot read file form a local path

2014-09-11 Thread Mozumder, Monir
cluster nodes is possible when all of the worker nodes have the file in question in their local file system. Still fairly new to Spark so bear with me if this is easily tunable by some config params. Bests, -Monir -Original Message- From: Mozumder, Monir Sent: Thursday,

RE: cannot read file form a local path

2014-09-11 Thread Mozumder, Monir
I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file ) : scala> val lines = sc.textFile("file:///home/monir/.bashrc") lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at :12 scala> val linecount = lines.count org.apache.hadoop.mapred.InvalidInputEx

How spark parallelize maps Slices to tasks/executors/workers

2014-09-04 Thread Mozumder, Monir
I have this 2-node cluster setup, where each node has 4-cores. MASTER (Worker-on-master) (Worker-on-node1) (slaves(master,node1)) SPARK_WORKER_INSTANCES=1 I am trying to understand Spark's parallelize behavior. The sparkPi example has this code: val slices = 8

RE: Benchmark on physical Spark cluster

2014-08-12 Thread Mozumder, Monir
An on-list follow up: http://prof.ict.ac.cn/BigDataBench/#Benchmarks looks promising as it has spark as one of the platforms used. Bests, -Monir From: Mozumder, Monir Sent: Monday, August 11, 2014 7:18 PM To: user@spark.apache.org Subject: Benchmark on physical Spark cluster I am trying to

Benchmark on physical Spark cluster

2014-08-11 Thread Mozumder, Monir
I am trying to get some workloads or benchmarks for running on a physical spark cluster and find relative speedups on different physical clusters. The instructions at https://databricks.com/blog/2014/02/12/big-data-benchmark.html uses Amazon EC2. I was wondering if anyone got other benchmarks f