Structured Streaming on Kubernetes Performance

2018-12-14 Thread Kalvin Chau
Hi, We've recently started testing spark on kubernetes, and have found some odd performance decreases. In particular its almost an order of magnitude slower pulling data from kafka than it is in our mesos cluster. We've tested a few set-ups: Baseline: Spark 2.3.0 on Mesos host networking (~5mill

Continuous Processing roadmap

2018-12-14 Thread albamoro
Couple of questions on this maybe someone could help me with: What is the roadmap for Continuous Processing? What is blocking its promotion to "stable"? Thanks! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: Driver Memory taken up by BlockManager

2018-12-14 Thread Davide.Mandrini
Hello, I am facing a similar issue, have you found a solution for that issue? Cheers, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Re: how to generate a larg dataset paralleled

2018-12-14 Thread lk_spark
sorry, now what I can do is like this : var df5 = spark.read.parquet("/user/devuser/testdata/df1").coalesce(1) df5 = df5.union(df5).union(df5).union(df5).union(df5) 2018-12-14 lk_spark 发件人:15313776907 <15313776...@163.com> 发送时间:2018-12-14 16:39 主题:Re: how to generate a larg dataset parallel

Re: how to generate a larg dataset paralleled

2018-12-14 Thread 15313776907
I also have this problem, hope to be able to solve here, thank you On 12/14/2018 10:38,lk_spark wrote: hi,all: I want't to generate some test data , which contained about one hundred million rows . I create a dataset have ten rows ,and I do df.union operation in 'for' circulation , but

SANSA 0.5 (Scalable Semantic Analytics Stack) Released

2018-12-14 Thread Gezim Sejdiu
Dear all, The Smart Data Analytics group (http://sda.tech) is happy to announce SANSA 0.5 - the fifth release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Apache Flink in order to allow scalable machine learning, inference and querying capabili