Hi Askash
The event dropping problem also triggered by slow listener or large
number of events or both, the easy and simple way is change the config of
`spark.scheduler.listenerbus.eventqueue.capacity`, its default value is 1.
But if after change the queue capacity to a more lager
Hi.
I'm trying to use the new feature but I can't use it with a big dataset
(about 5 million rows).
I tried increasing executor memory, driver memory, partition number, but
any solution can help me to solve the problem.
One of the executor task increase the shufle memory until fails.
Error is
hello all,
just playing with structured streaming aggregations for the first time.
this is my little program i run inside sbt:
import org.apache.spark.sql.functions._
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", )
Hi,
I run a few more tests and found that even with a lot more operations on the
scala side, python is outperformed...
Dataset Stream duration: ~3 minutes (csv formatted data messages read from
Kafka)
Scala process/store time: ~3 minutes (map with split + metrics calculations +
store raw +
I tried to fetch some data from Cassandra using SparkSql. For small tables,
all things go well but trying to fetch data from big tables I got the
following error:
java.lang.NoSuchMethodError:
I am working on writing a dataset to orc format to hdfs, while I meet
the following problem:
Error: name expected at the position 1473 of
'string:boolean:string:string..zone:struct<$ref:string> ...' but '$'
is found.
where the position 1473 is at "$ref:string" place.
Regard,
Junfeng Chen
Hi,
After you leave Spark Structured Streaming right after you generate RDDs
(for your streaming queries) you can do any kind of "joins". You're again
in the old good days of RDD programming (with all the whistles and bells).
Please note that Spark Structured Streaming != Spark Streaming since
Hi
I don't know whether this question is suitable for this forum, but I take the
risk and ask :)
In my understanding the execution model in Spark is very data (flow) stream
oriented and specific. Is it difficult to build a control flow logic (like
state-machine) outside of the stream specific