Hi,
We've recently started testing spark on kubernetes, and have found some odd
performance decreases. In particular its almost an order of magnitude
slower pulling data from kafka than it is in our mesos cluster.
We've tested a few set-ups:
Baseline: Spark 2.3.0 on Mesos host networking (~5mill
Couple of questions on this maybe someone could help me with:
What is the roadmap for Continuous Processing?
What is blocking its promotion to "stable"?
Thanks!
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
Hello,
I am facing a similar issue, have you found a solution for that issue?
Cheers,
Davide
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
sorry, now what I can do is like this :
var df5 = spark.read.parquet("/user/devuser/testdata/df1").coalesce(1)
df5 = df5.union(df5).union(df5).union(df5).union(df5)
2018-12-14
lk_spark
发件人:15313776907 <15313776...@163.com>
发送时间:2018-12-14 16:39
主题:Re: how to generate a larg dataset parallel
I also have this problem, hope to be able to solve here, thank you
On 12/14/2018 10:38,lk_spark wrote:
hi,all:
I want't to generate some test data , which contained about one hundred
million rows .
I create a dataset have ten rows ,and I do df.union operation in 'for'
circulation , but
Dear all,
The Smart Data Analytics group (http://sda.tech) is happy to announce SANSA
0.5 - the fifth release of the Scalable Semantic Analytics Stack. SANSA
employs distributed computing via Apache Spark and Apache Flink in order to
allow scalable machine learning, inference and querying capabili