Found this issue reported earlier but was bulk closed:
https://issues.apache.org/jira/browse/SPARK-27030
Regards,
Shrikant
On Fri, 22 Sep 2023 at 12:03 AM, Shrikant Prasad
wrote:
> Hi all,
>
> We have multiple spark jobs running in parallel trying to write into same
> hive table but each job wr
Hi all,
We have multiple spark jobs running in parallel trying to write into same
hive table but each job writing into different partition. This was working
fine with Spark 2.3 and Hadoop 2.7.
But after upgrading to Spark 3.2 and Hadoop 3.2.2, these parallel jobs are
failing with FileNotFound exc
In general you can probably do all this in spark-sql by reading in Hive
table through a DF in Pyspark, then creating a TempView on that DF, select
PM data through CAST() function and then use a windowing function to select
the top 5 with DENSE_RANK()
#Read Hive table as a DataFrame
df = spark.rea
Hello gurus,
I have a Hive table created as below (there are more columns)
CREATE TABLE hive.sample_data ( incoming_ip STRING, time_in TIMESTAMP, volume
INT );
Data is stored in that table
In PySpark, I want to select the top 5 incoming IP addresses with the highest
total volume of data tran