Hello gurus, I have a Hive table created as below (there are more columns)
CREATE TABLE hive.sample_data ( incoming_ip STRING, time_in TIMESTAMP, volume INT ); Data is stored in that table In PySpark, I want to select the top 5 incoming IP addresses with the highest total volume of data transferred during the PM hours. PM hours are decided by the column time_in with values like '00:45:00', '11:35:00', '18:25:00' Any advice is appreciated. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org