I had a similar issue this summer while prototyping Spark on K8s. I ended
up sticking with Hive Metastore 2 to meet time goals. Not sure if I was
using it correctly, but I only needed Hadoop + Hive JARs; I did not need to
run HDFS, YARN, etc. Using the metastore with an s3a warehouse.dir path
When using manual Kafka offset commit in Spark streaming job and application
fails to process current batch without committing offset in executor, is it
expected behavior that next batch will be processed and offset will be moved to
next batch regardless of application failure to commit? It
Hi the amazing spark team,
I was closely following these issues,
https://issues.apache.org/jira/browse/SPARK-27648
and then recently this:
https://issues.apache.org/jira/browse/SPARK-29055
Looks like all of it is fixed in this pull request:
https://github.com/apache/spark/pull/25973 and it was
Is it possible to use our own metastore instead of Hive Metastore with Spark
SQL?
Can you please point me to some docs or code I can look at to get it done?
We are moving away from everything Hadoop.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
What I was wondering after reading about spark pipe RDD is that we can
execute any python code (including machine learning ) . The code is going to
execute in distributed manner as well.
So if we can run machine learning code in distributed manner with pipeRDD
what's the usefulness of Spark ML.