date:20180528

Re: Spark AsyncEventQueue doubt

2018-05-28 Thread Yuanjian Li

Hi Askash The event dropping problem also triggered by slow listener or large number of events or both, the easy and simple way is change the config of `spark.scheduler.listenerbus.eventqueue.capacity`, its default value is 1. But if after change the queue capacity to a more lager

Pandas UDF for PySpark error. Big Dataset

2018-05-28 Thread Traku traku

Hi. I'm trying to use the new feature but I can't use it with a big dataset (about 5 million rows). I tried increasing executor memory, driver memory, partition number, but any solution can help me to solve the problem. One of the executor task increase the shufle memory until fails. Error is

trying to understand structured streaming aggregation with watermark and append outputmode

2018-05-28 Thread Koert Kuipers

hello all, just playing with structured streaming aggregations for the first time. this is my little program i run inside sbt: import org.apache.spark.sql.functions._ val lines = spark.readStream .format("socket") .option("host", "localhost") .option("port", )

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-05-28 Thread Saulo Sobreiro

Hi, I run a few more tests and found that even with a lot more operations on the scala side, python is outperformed... Dataset Stream duration: ~3 minutes (csv formatted data messages read from Kafka) Scala process/store time: ~3 minutes (map with split + metrics calculations + store raw +

Error on fetchin mass data from cassandra using SparkSQL

2018-05-28 Thread Soheil Pourbafrani

I tried to fetch some data from Cassandra using SparkSql. For small tables, all things go well but trying to fetch data from big tables I got the following error: java.lang.NoSuchMethodError:

Name error when writing data as orc

2018-05-28 Thread JF Chen

I am working on writing a dataset to orc format to hdfs, while I meet the following problem: Error: name expected at the position 1473 of 'string:boolean:string:string..zone:struct<$ref:string> ...' but '$' is found. where the position 1473 is at "$ref:string" place. Regard, Junfeng Chen

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

2018-05-28 Thread Jacek Laskowski

Hi, After you leave Spark Structured Streaming right after you generate RDDs (for your streaming queries) you can do any kind of "joins". You're again in the old good days of RDD programming (with all the whistles and bells). Please note that Spark Structured Streaming != Spark Streaming since

Execution model in Spark

2018-05-28 Thread Esa Heikkinen

Hi I don't know whether this question is suitable for this forum, but I take the risk and ask :) In my understanding the execution model in Spark is very data (flow) stream oriented and specific. Is it difficult to build a control flow logic (like state-machine) outside of the stream specific

Re: Spark AsyncEventQueue doubt

Pandas UDF for PySpark error. Big Dataset

trying to understand structured streaming aggregation with watermark and append outputmode

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

Error on fetchin mass data from cassandra using SparkSQL

Name error when writing data as orc

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

Execution model in Spark

8 matches

Site Navigation

Mail list logo

Footer information