Re: Is there an array explode function/transform?

Reuven Lax Thu, 14 Jan 2021 11:30:38 -0800

And the result is essentially a cross product of all the different array
elements?


On Thu, Jan 14, 2021 at 11:25 AM Robert Bradshaw <[email protected]>
wrote:

> I think it makes sense to allow specifying more than one, if desired. This
> is equivalent to just stacking multiple Unnests. (Possibly one could even
> have a special syntax like "*" for all array fields.)
>
> On Thu, Jan 14, 2021 at 10:05 AM Reuven Lax <[email protected]> wrote:
>
>> Should Unnest be allowed to specify multiple array fields, or just one?
>>
>> On Wed, Jan 13, 2021 at 11:59 PM Manninger, Matyas <
>> [email protected]> wrote:
>>
>>> I would also not unnest arrays nested in arrays just the top-level array
>>> of the specified fields.
>>>
>>> On Wed, 13 Jan 2021 at 20:58, Reuven Lax <[email protected]> wrote:
>>>
>>>> Nested fields are not part of standard SQL AFAIK. Beam goes further and
>>>> supports array of array, etc.
>>>>
>>>> On Wed, Jan 13, 2021 at 11:42 AM Kenneth Knowles <[email protected]>
>>>> wrote:
>>>>
>>>>> Just the fields specified, IMO. When in doubt, copy SQL. (and I mean
>>>>> SQL generally, not just Beam SQL)
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Jan 13, 2021 at 11:17 AM Reuven Lax <[email protected]> wrote:
>>>>>
>>>>>> Definitely could be a top-level transform. Should it automatically
>>>>>> unnest all arrays, or just the fields specified?
>>>>>>
>>>>>> We do have to define the semantics for nested arrays as well.
>>>>>>
>>>>>> On Wed, Jan 13, 2021 at 10:57 AM Robert Bradshaw <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Ah, thanks for the clarification. UNNEST does sound like what you
>>>>>>> want here, and would likely make sense as a top-level relational 
>>>>>>> transform
>>>>>>> as well as being supported by SQL.
>>>>>>>
>>>>>>> On Wed, Jan 13, 2021 at 10:53 AM Tao Li <[email protected]> wrote:
>>>>>>>
>>>>>>>> @Kyle Weaver <[email protected]> sure thing! So the input/output
>>>>>>>> definition for the Flatten.Iterables
>>>>>>>> <https://beam.apache.org/releases/javadoc/2.25.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html>
>>>>>>>> is:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Input: PCollection<Iterable<T>
>>>>>>>>
>>>>>>>> Output: PCollection<T>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The input/output for a explode transform would look like this:
>>>>>>>>
>>>>>>>> Input:  PCollection<Row> The row schema has a field which is an
>>>>>>>> array of T
>>>>>>>>
>>>>>>>> Output: PCollection<Row> The array type field from input schema is
>>>>>>>> replaced with a new field of type T. The elements from the array type 
>>>>>>>> field
>>>>>>>> are flattened into multiple rows in the new table (other fields of 
>>>>>>>> input
>>>>>>>> table are just duplicated.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hope this clarification helps!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From: *Kyle Weaver <[email protected]>
>>>>>>>> *Reply-To: *"[email protected]" <[email protected]>
>>>>>>>> *Date: *Tuesday, January 12, 2021 at 4:58 PM
>>>>>>>> *To: *"[email protected]" <[email protected]>
>>>>>>>> *Cc: *Reuven Lax <[email protected]>
>>>>>>>> *Subject: *Re: Is there an array explode function/transform?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> @Reuven Lax <[email protected]> yes I am aware of that transform,
>>>>>>>> but that’s different from the explode operation I was referring to:
>>>>>>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode
>>>>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191408293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IjXWhmHTGsbpgbxa1gJ5LcOFI%2BoiGIDYBwXPnukQfxk%3D&reserved=0>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> How is it different? It'd help if you could provide the signature
>>>>>>>> (input and output PCollection types) of the transform you have in mind.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 12, 2021 at 4:49 PM Tao Li <[email protected]> wrote:
>>>>>>>>
>>>>>>>> @Reuven Lax <[email protected]> yes I am aware of that transform,
>>>>>>>> but that’s different from the explode operation I was referring to:
>>>>>>>> https://spark.apache.org/docs/latest/api/sql/index.html#explode
>>>>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fsql%2Findex.html%23explode&data=04%7C01%7Ctaol%40zillow.com%7C1226a5d9efee43fc7d5508d8b75e5bfd%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637460963191418249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XuUUmNB3fgBasjDj0Dq1Z2g6%2Bc5fbvluf%2BnAp2m8cuE%3D&reserved=0>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From: *Reuven Lax <[email protected]>
>>>>>>>> *Reply-To: *"[email protected]" <[email protected]>
>>>>>>>> *Date: *Tuesday, January 12, 2021 at 2:04 PM
>>>>>>>> *To: *user <[email protected]>
>>>>>>>> *Subject: *Re: Is there an array explode function/transform?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Have you tried Flatten.iterables
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 12, 2021, 2:02 PM Tao Li <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi community,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Is there a beam function to explode an array (similarly to spark
>>>>>>>> sql’s explode())? I did some research but did not find anything.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> BTW I think we can potentially use FlatMap to implement the explode
>>>>>>>> functionality, but a Beam provided function would be very handy.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>>

Re: Is there an array explode function/transform?

Reply via email to