Definitely could be a top-level transform. Should it automatically unnest all arrays, or just the fields specified?
We do have to define the semantics for nested arrays as well. On Wed, Jan 13, 2021 at 10:57 AM Robert Bradshaw <[email protected]> wrote: > Ah, thanks for the clarification. UNNEST does sound like what you want > here, and would likely make sense as a top-level relational transform as > well as being supported by SQL. > > On Wed, Jan 13, 2021 at 10:53 AM Tao Li <[email protected]> wrote: > >> @Kyle Weaver <[email protected]> sure thing! So the input/output >> definition for the Flatten.Iterables >> <https://beam.apache.org/releases/javadoc/2.25.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html> >> is: >> >> >> >> Input: PCollection<Iterable<T> >> >> Output: PCollection<T> >> >> >> >> The input/output for a explode transform would look like this: >> >> Input: PCollection<Row> The row schema has a field which is an array of >> T >> >> Output: PCollection<Row> The array type field from input schema is >> replaced with a new field of type T. The elements from the array type field >> are flattened into multiple rows in the new table (other fields of input >> table are just duplicated. >> >> >> >> Hope this clarification helps! >> >> >> >> *From: *Kyle Weaver <[email protected]> >> *Reply-To: *"[email protected]" <[email protected]> >> *Date: *Tuesday, January 12, 2021 at 4:58 PM >> *To: *"[email protected]" <[email protected]> >> *Cc: *Reuven Lax <[email protected]> >> *Subject: *Re: Is there an array explode function/transform? >> >> >> >> @Reuven Lax <[email protected]> yes I am aware of that transform, but >> that’s different from the explode operation I was referring to: >> https://spark.apache.org/docs/latest/api/sql/index.html#explode >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191408293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IjXWhmHTGsbpgbxa1gJ5LcOFI%2BoiGIDYBwXPnukQfxk%3D&reserved=0> >> >> >> >> How is it different? It'd help if you could provide the signature (input >> and output PCollection types) of the transform you have in mind. >> >> >> >> On Tue, Jan 12, 2021 at 4:49 PM Tao Li <[email protected]> wrote: >> >> @Reuven Lax <[email protected]> yes I am aware of that transform, but >> that’s different from the explode operation I was referring to: >> https://spark.apache.org/docs/latest/api/sql/index.html#explode >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191418249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XuUUmNB3fgBasjDj0Dq1Z2g6%2Bc5fbvluf%2BnAp2m8cuE%3D&reserved=0> >> >> >> >> *From: *Reuven Lax <[email protected]> >> *Reply-To: *"[email protected]" <[email protected]> >> *Date: *Tuesday, January 12, 2021 at 2:04 PM >> *To: *user <[email protected]> >> *Subject: *Re: Is there an array explode function/transform? >> >> >> >> Have you tried Flatten.iterables >> >> >> >> On Tue, Jan 12, 2021, 2:02 PM Tao Li <[email protected]> wrote: >> >> Hi community, >> >> >> >> Is there a beam function to explode an array (similarly to spark sql’s >> explode())? I did some research but did not find anything. >> >> >> >> BTW I think we can potentially use FlatMap to implement the explode >> functionality, but a Beam provided function would be very handy. >> >> >> >> Thanks a lot! >> >>
