Hi All,
I am trying to run a spark job using yarn, and i specify --executor-cores
value as 20.
But when i go check the "nodes of the cluster" page in
http://hostname:8088/cluster/nodes then i see 4 containers getting created
on each of the node in cluster.
But can only see 1 vcore getting assigne
There was SPARK-12008 which was closed.
Not sure if there is active JIRA in this regard.
On Tue, Aug 2, 2016 at 6:40 PM, 马晓宇 wrote:
> Hi guys,
>
> I wonder if anyone working on SQL based authorization already or not.
>
> This is something we needed badly right now and we tried to embedded a
> H
Hi guys,
I wonder if anyone working on SQL based authorization already or not.
This is something we needed badly right now and we tried to embedded a
Hive frontend in front of SparkSQL to achieve this but it's not quite a
elegant solution. If SparkSQL has a way to do it or anyone already
work
I believe it was intentional with the idea that it would be more unified
between Java and Scala APIs. If your talking about the javadoc mention in
https://github.com/apache/spark/pull/14466/files - I believe the += is
meant to refer to what the internal implementation of the add function can
be for
It seems like the += operator is missing from the new accumulator API,
although the docs still make reference to it. Anyone know if it was
intentionally not put in? I'm happy to do a PR for it or update the docs
to just use the add() method, just want to check if there was some reason
first.
Bry
Dear Spark developers,
Could you suggest how to perform pattern matching on the type of the graph edge
in the following scenario. I need to perform some math by means of
aggregateMessages on the graph edges if edges are Double. Here is the code:
def my[VD: ClassTag, ED: ClassTag] (graph: Graph[V
Spark does optimise subsequent limits, for example:
scala> df1.limit(3).limit(1).explain
== Physical Plan ==
CollectLimit 1
+- *SerializeFromObject [assertnotnull(input[0, $line14.$read$$iw$$iw$my,
true], top level non-flat input object).x AS x#2]
+- Scan ExternalRDDScan[obj#1]
However, limit
Widening to dev@spark
On Mon, Aug 1, 2016 at 4:21 PM, Noorul Islam K M wrote:
>
> Hi all,
>
> I was trying to test --supervise flag of spark-submit.
>
> The documentation [1] says that, the flag helps in restarting your
> application automatically if it exited with non-zero exit code.
>
> I am lo
Thank you for your prompt response and great examples Sun Rui but I am
still confused about one thing. Do you see any particular reason to not
to merge subsequent limits? Following case
(limit n (map f (limit m ds)))
could be optimized to:
(map f (limit n (limit m ds)))
and further to
Based on your code, here is simpler test case on Spark 2.0
case class my (x: Int)
val rdd = sc.parallelize(0.until(1), 1000).map { x => my(x) }
val df1 = spark.createDataFrame(rdd)
val df2 = df1.limit(1)
df1.map { r => r.getAs[Int](0) }.first
df2.map { r => r.getAs[Int](0) }.first // Much slow
Note that both HashingTF and CountVectorizer are usually used for creating
TF-IDF normalized vectors. The definition (
https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Definition) of term frequency
in TF-IDF is actually the "number of times the term occurs in the document".
So it's perhaps a bit of a
11 matches
Mail list logo