indeed the scala version could be blocking (I'm not sure what it needs
2.11, maybe Miles uses quasiquotes...)
Andy
On Tue, Nov 19, 2013 at 8:48 AM, Anwar Rizal anriza...@gmail.com wrote:
I had that in mind too when Miles Sabin presented Shapeless at Scala.IO
Paris last month.
If anybody
Hi,
I am trying to do some tasks with the following style map function:
rdd.map { e =
val a = new Array[Int](100)
...Some calculation...
}
But here the array a is really just used as a temporary buffer and can be
reused. I am wondering if I can avoid constructing it everytime? (As it
I am trying to start 12 workers with 2 cores on each Node using the following:
In spark-env.sh (copied in every slave) I have set:
SPARK_WORKER_INSTANCES=12
SPARK_WORKER_CORES=2
I start Scala console with:
SPARK_WORKER_CORES=2 SPARK_MEM=3g MASTER=spark://x:7077
Hi Tom,
Thank you for your response. I have double checked that I had upload
both jar in the same folder on hdfs. I think the namefs.default.name/name
you pointed out is the old deprecated name for fs.defaultFS config
accordiing
mapWith can make this use case even simpler.
On Nov 19, 2013, at 1:29 AM, Sebastian Schelter ssc.o...@googlemail.com
wrote:
You can use mapPartition, which allows you to apply the map function
elementwise to all elements of a partition. Here you can place custom code
around your
On Tue, Nov 19, 2013 at 6:50 AM, Yadid Ayzenberg ya...@media.mit.eduwrote:
Hi all,
According to the documentation, spark standalone currently only supports a
FIFO scheduling system.
I understand its possible to limit the number of cores a job uses by
setting spark.cores.max.
When running
According to the documentation, spark standalone currently only supports a
FIFO scheduling system.
That's not
true.http://spark.incubator.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools
[sorry for the prior misfire]
On Tue, Nov 19, 2013 at 7:30 AM, Mark Hamstra
I think that is Scheduling Within an Application, and he asked across apps.
Actually spark standalone supports two ways of scheduling both are FIFO
type. http://spark.incubator.apache.org/docs/latest/spark-standalone.html
One is spread out mode and the other is use as fewer node as possible [1]
The property is deprecated but will still work. Either one is fine.
Launching the job from the namenode is fine .
I brought up a cluster with 2.0.5-alpha and built the latest spark master
branch and it runs fine for me. It looks like namenode 2.0.5-alpha won't even
start with the defaulFs of
Ah, sorry -- misunderstood the question.
On Nov 19, 2013, at 7:48 AM, Prashant Sharma scrapco...@gmail.com wrote:
I think that is Scheduling Within an Application, and he asked across apps.
Actually spark standalone supports two ways of scheduling both are FIFO type.
Actually it is not documented anywhere, maybe we can do that. Was having
tough time thinking how to implement this in Spark-743.
On Tue, Nov 19, 2013 at 9:33 PM, Yadid Ayzenberg ya...@media.mit.eduwrote:
My bad - I should have stated that up front. I guess it was kind of
implicit within my
Yes sure for usual tests it is fine, but the broadcast is only done if we
are not in local mode (at least seems so).
In SparkContext we have def broadcast[T](value: T) =
env.broadcastManager.newBroadcast[T](value, isLocal)
the is local is computed from the master name (local or local[...). Now
If
No, it's my fault for not reading more carefully. We do use a somewhat
overloaded and specialized lexicon to describe Spark, which helps when it
is used uniformly, but penalizes those who leap to misunderstanding.
Prashant is correct that the largest granularity thing that a user
launches to do
aah, yes. I missed that. I looked into the code. Both TreeBroadcast and
HttpBroadcast don't do send or write respectively.. Will wait for other
inputs.
On Tue, Nov 19, 2013 at 10:40 PM, Eugen Cepoi cepoi.eu...@gmail.com wrote:
Yes sure for usual tests it is fine, but the broadcast is only done
Hi,
I would like to use Spark to distribute some computations that rely on
my existing Python installation. I know that Spark includes its own
Python but it would be much easier to just install a package and perhaps
do a bit of configuration.
Thanks,
Michal
This is the code creating the treemap:
object CaseInsensitiveOrdered extends Ordering[String] {
def compare(x: String, y: String): Int = x.compareToIgnoreCase(y)
}
TreeMap[String, JobTitle](dico.toArray:_*)(CaseInsensitiveOrdered)
this is the map that is broadcasted.
BTW* if I remove the
btw, there is an open PR to allow spreadOut to be configured per-app,
instead of per-cluster
https://github.com/apache/incubator-spark/pull/136
On Tue, Nov 19, 2013 at 11:20 AM, Mark Hamstra m...@clearstorydata.com wrote:
No, it's my fault for not reading more carefully. We do use a somewhat
Yes, I see - I misused the term job.
On 11/19/13 12:20 PM, Mark Hamstra wrote:
No, it's my fault for not reading more carefully. We do use a
somewhat overloaded and specialized lexicon to describe Spark, which
helps when it is used uniformly, but penalizes those who leap to
Assuming I also want to run n concurrent jobs of the following type:
each RDD is of the same form (JavaPairRDD), and I would like to run the
same transformation on all RDDs.
The brute force way would be to instantiate n threads and submit a job
from each thread.
Would this way be valid as
PySpark doesn't include a Python interpreter; by default, it will use your
system `python`. The pyspark script (
https://github.com/apache/incubator-spark/blob/master/pyspark) just
performs some setup of environment variables, adds the PySpark python
dependencies to PYTHONPATH, and adds some code
I think that the reduce() action is implemented as
mapPartitions.collect().reduce(), so the number of result tasks is
determined by the degree of parallelism of the RDD being reduced.
Some operations, like reduceByKey(), accept a `numPartitions` argument for
configuring the number of reducers:
We have a 4 node Spark cluster with 3 gigs of ram available per executor
(via the spark.executor.memory setting). When we run a Spark job, we see
the following output:
Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Initializing interpreter...
Creating
To explain more, we upgraded from 0.7.3 to 0.9 incubating snapshot today
and are getting out of memory errors very quickly even though our cluster
has plenty of RAM and the data is relatively small:
Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Initializing
This line
13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity 323.9 MB
looks like what you'd get if you haven't set spark.executor.memory (or
SPARK_MEM). Without setting it you'll get the default to 512m per
executor and .66 of that for the cache.
-Ewen
-
Ewen
In our spark-env.sh, we have:
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export ADD_JARS=/opt/spark/mx-lib/verrazano_2.9.3-0.1-SNAPSHOT-assembly.jar
if [ -z $SPARK_JAVA_OPTS ] ; then
SPARK_JAVA_OPTS=-Xss20m -Dspark.local.dir=/opt/spark/tmp
-Dspark.executor.memory=3g
Is spark-env.sh being sourced prior to running your job? The spark-shell
script handles this automatically, but you may need to `source
spark-env.sh` in the shell that runs your driver program in order for these
environment variables to be set.
On Tue, Nov 19, 2013 at 4:25 PM, Gary Malouf
the use case is to fit a moving average model for stock prices with the
form:
x_n = \sum_{i = 1}^k \alpha_i * x_{n - i}
Can you please provide me the pseudo-code?
On Tue, Nov 19, 2013 at 4:30 PM, andy petrella andy.petre...@gmail.comwrote:
1/ you mean like reshape in R?
2/ Or you mean by
Hi Josh,
We are running our job within the spark shell, so I think it is being
sourced. We are on Mesos 0.13 and Spark 0.9 SNAPSHOT taken from around 9am
eastern this morning.
Any other possible culprits?
On Tue, Nov 19, 2013 at 7:30 PM, Josh Rosen rosenvi...@gmail.com wrote:
Is
sure, thanks a lot.
If anyone dealt with issue before, I will appreciate the help :)
On Tue, Nov 19, 2013 at 4:56 PM, andy petrella andy.petre...@gmail.comwrote:
h ok.
Providing a pseudo-code would require me to be a bit more awake -- 2AM in
Belgium... have to go to sleep, otherwise the
Hi, all.
I wrote two programs: A.scala and B.scala.
A.scala writes trained model to HDFS with:
_wcount_rdd.saveAsObjectFile(save_path)
I used the command hadoop fs -ls $save_path, and find a directory named
$save_path:
[root@gd39 spark-0.8.0-incubating] # hadoop fs -ls
I have a scala script that I'm trying to run on a Spark standalone cluster
with just one worker (existing on the master node). But the application
just hangs. Here's the worker log output at the time of starting the job:
3/11/19 18:03:13 INFO Worker: Asked to launch executor
Hi Tom,
Thank you for your help. I finally found the problem. It's a silly
mistake for me. After checkout git repository, I forgot to change the
spark-env.sh under conf folder to add yarn config folder. I guess it might
be helpful to display warning message about that. Anyway, thank you for
32 matches
Mail list logo