Re: [EXT] Re: [EXT] Re: [EXT] Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-05-14 Thread Wenbing Bai
" of items and hence produce more > than one shard. See also > https://beam.apache.org/documentation/runtime/model/ > > On Thu, May 13, 2021 at 3:58 PM Wenbing Bai > wrote: > >> Hi team, >> >> I have another question when using Beam Dataframe IO connect

Re: [EXT] Re: [EXT] Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-05-13 Thread Wenbing Bai
> Kenn > > On Mon, May 10, 2021 at 5:26 PM Wenbing Bai > wrote: > >> Hi Robert and Brian, >> >> I don't know why I didn't catch your replies. But thank you so much for >> looking at this. >> >> My parquet files will be consumed by downstre

Re: [EXT] Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-05-10 Thread Wenbing Bai
;> partition_cols should work, I filed BEAM-12201 [1] for this. That alone >> won't be enough as our implementation will likely reshuffle the dataset to >> enforce the partitioning, removing any sorting that you've applied, so we'd >> also need to think about how to opti

Re: [EXT] Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-04-07 Thread Wenbing Bai
https://github.com/apache/beam/blob/a8cd05932bed9b2480316fb8518409636cb2733b/sdks/python/apache_beam/dataframe/io.py#L525 > > On Wed, Apr 7, 2021 at 2:22 PM Wenbing Bai > wrote: > >> Hi Robert and Brian, >> >> I tried groupby in my case. Here is my pipeline code. I do see

Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-04-07 Thread Wenbing Bai
{}.parquet'.format(str (uuid.uuid4())[:8]), engine='pyarrow', index=False) On Fri, Apr 2, 2021 at 10:00 AM Wenbing Bai wrote: > Thank you, Robert and Brian. > > I'd like to try this out. I am trying to distribute my dataset to nodes, > sort each partition by some key and then store each

Re: [EXT] Re: Beam Dataframe - sort and grouping

2021-04-02 Thread Wenbing Bai
s are by definition unordered, so >> unless you sort a partition and immediately do something with it that >> ordering may not be preserved. If you could let us know what you're trying >> to do with this ordering that would be helpful. >> >> - Robert >> >> >> O

Beam Dataframe - sort and grouping

2021-04-01 Thread Wenbing Bai
will be distributed to different nodes. I also tried df.sort_values, but it will sort my whole dataset, which is not what I need. Can someone shed some light on this? Wenbing Bai Senior Software Engineer Data Infrastructure, Cruise Pronouns: She/Her -- *Confidentiality Note:* We care

Help needed on Dataflow worker exception of WriteToBigQuery

2020-02-24 Thread Wenbing Bai
(WriteRecordsToFile)/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)'] Anyone who had this before? Can I get any hints on where Dataflow worker writing data to avro? -- Wenbing Bai Senior Software Engineer, MLP Cruise Pronouns: She/Her -- *Confidentiality Note:* We care about protecting our