Hello there,
While using docker-image-tool (for Spark 3.1.1) it seems to not accept
`java_image_tag` property. The docker image default to JRE 11. Here is what
I am running from the command line.
$ spark/bin/docker-image-tool.sh -r docker.io/sample-spark -b
java_image_tag=8-jre-slim -t 3.1.1
Hi,
Could someone please revert on this?
Thanks
Pankaj Bhootra
On Sun, 7 Mar 2021, 01:22 Pankaj Bhootra, wrote:
> Hello Team
>
> I am new to Spark and this question may be a possible duplicate of the
> issue highlighted here: https://issues.apache.org/jira/browse/SPARK-9347
>
> We have a
thanks, i try it right now
Kent Yao 于2021年3月10日周三 上午11:11写道:
> Hi Li,
> Have you tried `Interacting with Different Versions of Hive Metastore`
> http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
>
>
> Bests,
>
> *Kent Yao
Hi Li,Have you tried `Interacting with Different Versions of Hive Metastore` http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
Bests,
Hi,sorry to bother you.In spark 3.0.1,hive-1.2 is supported,but in spark
3.1.x maven profile hive-1.1 is removed.Is that means hive-1.2 does not
supported in spark 3.1.x? how can i support hive-1.2 in spark 3.1.x,or any
jira? can anyone help me ?
hi,
Our Spark writes to GCS are slow. The reason I see is that a staging
directory used for the initial data generation following by copying the data
to actual directory in GCS. Following are few configs and code. Any
suggestions on how to speed this thing up will be great.
Thanks Sean,
I am using PySpark. There seems to be some reports on foreach usage with
local mode back on the 3rd March. For example, see
"Spark structured streaming seems to work on local mode only"
I believe the thread owner was reporting on* foreach *case not foreachBatch.
cheers
That should not be the case. See
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch
Maybe you are calling .foreach on some Scala object inadvertently.
On Tue, Mar 9, 2021 at 4:41 PM Mich Talebzadeh
wrote:
> Hi,
>
> When I use
Hi,
When I use *foreachBatch *is Spark structured streaming, yarn mode works
fine.
When one switches to *foreach* mode (row by row processing), this
effectively runs in local mode on a single JVM. It seems to crash when
running in a distributed mode. That is my experience.
Can someone else
Not sure if kinesis have such flexibility. What else possibilities are there
at transformations level?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Any example for this please
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
You can also group by the key in the transformation on each batch. But yes
that's faster/easier if it's already partitioned that way.
On Tue, Mar 9, 2021 at 7:30 AM Ali Gouta wrote:
> Do not know Kenesis, but it looks like it works like kafka. Your producer
> should implement a paritionner that
Do not know Kenesis, but it looks like it works like kafka. Your producer
should implement a paritionner that makes it possible to send your data
with the same key to the same partition. Though, each task in your spark
streaming app will load data from the same partition in the same executor.
I
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned.
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned.
15 matches
Mail list logo