Need to split incoming data into PM on time column and find the top 5 by volume of data

ashok34...@yahoo.com.INVALID Thu, 21 Sep 2023 10:01:21 -0700

Hello gurus,

I have a Hive table created as below (there are more columns)


CREATE TABLE hive.sample_data ( incoming_ip STRING, time_in TIMESTAMP, volume 
INT );

Data is stored in that table

In PySpark, I want to  select the top 5 incoming IP addresses with the highest 
total volume of data transferred during the PM hours. PM hours are decided by 
the column time_in with values like '00:45:00', '11:35:00', '18:25:00'

Any advice is appreciated.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Need to split incoming data into PM on time column and find the top 5 by volume of data

Reply via email to