https://spark.apache.org/docs/3.3.2/sql-ref-functions-udf-aggregate.html
I'm trying to run this example on Databricks, and it fails with the stacktrace
below. It's literally a copy-paste from the example, what am I missing?
Job aborted due to stage failure: Task not serializable:
java.io.Not
Thanks for your suggestion that I take it as a workaround. Whilst this
workaround can potentially address storage allocation issues, I was more
interested in exploring solutions that offer a more seamless integration
with large distributed file systems like HDFS, GCS, or S3. This would
ensure bette
You can make a PVC on K8S call it 300GB
make a folder in yours dockerfile
WORKDIR /opt/spark/work-dir
RUN chmod g+w /opt/spark/work-dir
start spark with adding this
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName",
"300gb") \
.config("spark.kubernetes.driv
This indeed looks like a bug. I will take some time to look into it.
Mich Talebzadeh 于2024年4月3日周三 01:55写道:
>
> hm. you are getting below
>
> AnalysisException: Append output mode not supported when there are
> streaming aggregations on streaming DataFrames/DataSets without watermark;
>
> The pro
[[VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)](
https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb)
Apache Spark 4.0.0 Release Plan
===
1. After creating `branch-3.5`, set "4.0.0-SNAPSHOT" in master branch.
2. Creating `branch-4.0` on April 1st
Unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
I have seen some older references for shuffle service for k8s,
although it is not clear they are talking about a generic shuffle
service for k8s.
Anyhow with the advent of genai and the need to allow for a larger
volume of data, I was wondering if there has been any more work on
this matter. Speci