Hi Beam users,

I have a user case to partition my PCollection by some key, and then sort
my rows within the same partition by some other key.

I feel Beam Dataframe could be a candidate solution, but I cannot figure
out how to make it work. Specifically, I tried df.groupby where I expect my
data will be distributed to different nodes. I also tried df.sort_values,
but it will sort my whole dataset, which is not what I need.

Can someone shed some light on this?





Wenbing Bai

Senior Software Engineer

Data Infrastructure, Cruise

Pronouns: She/Her

-- 


*Confidentiality Note:* We care about protecting our proprietary 
information, confidential material, and trade secrets. This message may 
contain some or all of those things. Cruise will suffer material harm if 
anyone other than the intended recipient disseminates or takes any action 
based on this message. If you have received this message (including any 
attachments) in error, please delete it immediately and notify the sender 
promptly.

Reply via email to