Hello, I have a table with userid, movieId and some more columns say c1, c2, c3.... I want to group the records by userId and then do some processing on those records (for each user) and output less number of records (or same number of records) based on some logic.
The processing involves considering all the records at once for that user say n, do some comparison, filtering, composition, add some more columns etc and output m records. m <= n I can achieve this first creating a view by using a group by clause (concatenating all the columns) and using collect_set. This way for each user I would get an array containing all the rows and all the columns. Using that view I can then apply UDTF to do the processing on each row and output more rows Here is the example uid1 m1 c1r1 c2r1 c3r1 uid1 m2 c1r2 c2r2 c3r2 uid1 m1c1r3 c2r3 c3r3 uid2 m4 c1r4 c2r4 c3r4 uid2 m1 c1r5 c2r5 c3r5 uid2 m2 c1r6 c2r6 c3r6 uid2 m6 c1r7 c2r7 c3r7 CREATE VIEW groupByUserId_view (all_user_col) as select collect_set(concat_ws('\t',userid,m,c1r,c2r,c3r)) from userMovie group by userid; (the view will have only two rows for uid1 and uid2) SELECT processing_UDTF(all_user_col) as final_result from groupByUserId_view The final output is uid1 m1 c1r1 c2r1 c3r1 c4r1 uid1 m2 c1r2 c2r2 c3r2 c4r2 uid2 m1 c1r4 c2r4 c3r4 c4r3 uid2 m2 c1r7 c2r7 c3r7 c4r4 (based on the processing logic, both uid1 and uid2 has now only 2 rows and and extra column). I want to ask , is there a better way to achieve this? Thank you