Does fair scheduling in Spark
(http://spark.incubator.apache.org/docs/latest/job-scheduling.html#schedulin
g-within-an-application) preempt running tasks if a job with higher priority
is submitted? If not, is this part of the plan at some point? Thanks!
Mingyu
smime.p7s
Description: S/MIME cry
After creating a lot of Spark connections, work/app-* folders in Worker
nodes keep getting created without any clean-up being done. This
particularly becomes a problem when the Spark driver programs ship jars or
files. Is there any way to garbage collect these without manually deleting
them? Thanks
Here¹s my understanding of row order guarantees by RDD in the context of
limit() and collect(). Can someone confirm this?
* sparkContext.parallelize(myList) returns an RDD that may have a different
row order than myList.
* Every RDD loaded with the same file in HDFS (e.g.
sparkContext.textFile(³hdf
solution might be to create your own setup script to run on the
instances after ³start².
Matei
On Jan 24, 2014, at 2:19 PM, Mingyu Kim wrote:
> Hi all,
>
> I found it confusing that "./spark-ec2 start² actually reinstalls the cluster,
> which ends up wiping out all the configur
Hi all,
I found it confusing that "./spark-ec2 start² actually reinstalls the
cluster, which ends up wiping out all the configurations. How about renaming
³start² to ³install² and add a real light-weight ³start² for frequently
starting and stopping ec2 instances for mostly cost reasons? The
light-
tei
On Jan 23, 2014, at 11:16 AM, Mingyu Kim wrote:
> Hi all,
>
> How important is it to call stop() when the process which started the
> SparkContext is dying anyways? Will I see resource leaks if I don¹t?
>
> Mingyu
smime.p7s
Description: S/MIME cryptographic signature
Hi all,
How important is it to call stop() when the process which started the
SparkContext is dying anyways? Will I see resource leaks if I don¹t?
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
Hi all,
I¹d like the added jars on worker nodes (i.e. SparkContext.addJar()) to be
cleaned up on tear down. However, SparkContext.stop() doesn¹t seem to delete
them. What would be the best way to clear them? Or, is there an easy way to
add this functionality?
Mingyu
smime.p7s
Description: S/M
Hi all,
I¹m having hard time trying to find out ways to report exception that
happens during computation to the end-user of Spark system without having
them ssh into the worker nodes or accessing Spark UI. For example, if some
exception happens in the code that runs on worker nodes (e.g.
IllegalSt
Hi,
Scala-2.10 branch seems to have been kept out of sync with master. Can I
request a merge with master? I especially would like to have the ³job group²
changes (https://github.com/apache/incubator-spark/pull/29 and
https://github.com/apache/incubator-spark/pull/74).
Also, is there any timeline
>> However, if you are just after spark query concurrency, spark 0.8 seems to be
>> supporting concurrent (reentrant) requests to the same session
>> (SparkContext). One should also be able to use FAIR scheduler in this case it
>> seems (at least that's what i request). So
/incubator-spark/pull/190> .
On Wed, Nov 20, 2013 at 3:39 AM, Mingyu Kim wrote:
> Hi all,
>
> Cancellation seems to be supported at application level. In other words, you
> can call stop() on your instance of SparkContext in order to stop the
> computation associated with the Spark
Hi all,
Cancellation seems to be supported at application level. In other words, you
can call stop() on your instance of SparkContext in order to stop the
computation associated with the SparkContext. Is there any way to cancel a
job? (To be clear, job is "a parallel computation consisting of mult
Hi all,
I¹ve been searching to find out the current status of the multiple
SparkContext support in one JVM. I found
https://groups.google.com/forum/#!topic/spark-developers/GLx8yunSj0A and
https://groups.google.com/forum/#!topic/spark-users/cOYP96I668I. According
to the threads, I should be able t
NmWnbd3
>eEJ9hVUdMk%3D%0A&m=ZTFNyaCyeYcrRQk9a5LSvYYYKFjWEAvdrxCsh2naFOM%3D%0A&s=552
>138bf1348ecb763f024a55626cf819d5ec00d095653404702683468f589ad, it seems
>you can add the following into extraAssemblySettings:
>
>assemblyOption in assembly ~= { _.copy(includeScala = false) }
>
>Matei
>
>On O
Hi,
In order to work around the library dependency problem, I¹d like to build
the spark jar such that it doesn¹t contain certain libraries. I will import
the libraries separately and have them available at runtime. More
specifically, I¹d like to remove scala-2.9.3 out of the spark jar built by
³sb
Thanks for the response! I'll try out the 2.10 branch. That seems to be the
best bet for now.
Btw, how does updating maven file do the private namespacing? We've been
trying out jarjar (https://code.google.com/p/jarjar/), but as you mentioned,
reflection has been biting us painfully so far. I'm no
Hi all,
I'm trying to use spark in our existing code base. However, a lot of spark
dependencies are not updated to the latest versions and they conflict with
our versions of the libraries. Most notably scala-2.9.2 and scala-2.10.1.
Have people run into these problems before? How did you work aroun
#x27;t is
when you change the RDD's partitioner, e.g. by doing sortByKey or
groupByKey. It would definitely be good to document this more formally.
Matei
On Oct 3, 2013, at 3:33 PM, Mingyu Kim wrote:
> Hi all,
>
> Is the sort order guaranteed if you apply operations like map(), f
Hi all,
Is the sort order guaranteed if you apply operations like map(), filter() or
distinct() after sort in a distributed setting (run on a cluster of machines
backed by HDFS)? In other words, does rdd.sortByKey().map() have the same
sort order as rdd.sortByKey()? If so, is it documented somewhe
20 matches
Mail list logo