I'm trying to construct a cartesian result as a pyarrow table using pyarrow
compute, but haven't found any elegant way to do this..
Any suggestions?
For inputs :
n_legs = pa.array([2, 4, 5, 100])
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
names = ["n_legs", "animals"]
Desired output: 16 rows which is the product of 4 elements in n_legs x 4
elements in animals..
>>> final_table
pyarrow.Table
n_legs: int64
animals: string
----
n_legs: [[2,4,5,100,100,...,4,4,5,100,2]]
animals: [["Flamingo","Horse","Brittle
stars","Centipede","Flamingo",...,"Centipede","Flamingo","Horse","Brittle
stars","Centipede"]]
>>> final_table.to_pylist()
[{'n_legs': 2, 'animals': 'Flamingo'}, {'n_legs': 4, 'animals': 'Horse'},
{'n_legs': 5, 'animals': 'Brittle stars'}, {'n_legs': 100, 'animals':
'Centipede'}, {'n_legs': 100, 'animals': 'Flamingo'}, {'n_legs': 2, 'animals':
'Horse'}, {'n_legs': 4, 'animals': 'Brittle stars'}, {'n_legs': 5, 'animals':
'Centipede'}, {'n_legs': 100, 'animals': 'Flamingo'}, {'n_legs': 5, 'animals':
'Horse'}, {'n_legs': 2, 'animals': 'Brittle stars'}, {'n_legs': 4, 'animals':
'Centipede'}, {'n_legs': 4, 'animals': 'Flamingo'}, {'n_legs': 5, 'animals':
'Horse'}, {'n_legs': 100, 'animals': 'Brittle stars'}, {'n_legs': 2, 'animals':
'Centipede'}]
The above example is a cartesian join between two arrays, but potentially this
could a product join between 3, 4 or 5 arrays which may also be different
lengths..
This message may contain information that is confidential or privileged. If you
are not the intended recipient, please advise the sender immediately and delete
this message. See
http://www.blackrock.com/corporate/compliance/email-disclaimers for further
information. Please refer to
http://www.blackrock.com/corporate/compliance/privacy-policy for more
information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see
http://www.blackrock.com/corporate/about-us/contacts-locations.
© 2023 BlackRock, Inc. All rights reserved.