Hello gurus,

I have a Hive table created as below (there are more columns)

CREATE TABLE hive.sample_data ( incoming_ip STRING, time_in TIMESTAMP, volume 
INT );

Data is stored in that table

In PySpark, I want to  select the top 5 incoming IP addresses with the highest 
total volume of data transferred during the PM hours. PM hours are decided by 
the column time_in with values like '00:45:00', '11:35:00', '18:25:00'

Any advice is appreciated.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to