yeah, I don't really see anything that jumps out at me as being a clear solution. I'm also not sure that you would want that materialized unless your final result was reasonably small.
I don't see acero as having implemented crossrel[1], which would be exactly what you'd want. My suggestion is essentially to build it yourself, potentially as a compute function. The only other general recommendations I can give would be to use dictionary encoding for string columns, maybe some clever use of run-length encoding, and/or to use generators since you're in python. Definitely a lackluster answer, but if you would like more direction then sharing your requirements would be really useful. [1]: https://substrait.io/relations/logical_relations/#cross-product-operation On Thu, May 4, 2023 at 13:56, Lee, David (PAG) <[email protected]> wrote: > I'm trying to construct a cartesian result as a pyarrow table using pyarrow > compute, but haven't found any elegant way to do this.. > > Any suggestions? > > For inputs : > > n_legs = pa.array([2, 4, 5, 100]) > animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"]) > names = ["n_legs", "animals"] > > Desired output: 16 rows which is the product of 4 elements in n_legs x 4 > elements in animals.. > > >>> final_table > pyarrow.Table > n_legs: int64 > animals: string > ---- > n_legs: [[2,4,5,100,100,...,4,4,5,100,2]] > animals: [["Flamingo","Horse","Brittle > stars","Centipede","Flamingo",...,"Centipede","Flamingo","Horse","Brittle > stars","Centipede"]] > > >>> final_table.to_pylist() > [{'n_legs': 2, 'animals': 'Flamingo'}, {'n_legs': 4, 'animals': 'Horse'}, > {'n_legs': 5, 'animals': 'Brittle stars'}, {'n_legs': 100, 'animals': > 'Centipede'}, {'n_legs': 100, 'animals': 'Flamingo'}, {'n_legs': 2, > 'animals': 'Horse'}, {'n_legs': 4, 'animals': 'Brittle stars'}, {'n_legs': 5, > 'animals': 'Centipede'}, {'n_legs': 100, 'animals': 'Flamingo'}, {'n_legs': > 5, 'animals': 'Horse'}, {'n_legs': 2, 'animals': 'Brittle stars'}, {'n_legs': > 4, 'animals': 'Centipede'}, {'n_legs': 4, 'animals': 'Flamingo'}, {'n_legs': > 5, 'animals': 'Horse'}, {'n_legs': 100, 'animals': 'Brittle stars'}, > {'n_legs': 2, 'animals': 'Centipede'}] > > The above example is a cartesian join between two arrays, but potentially > this could a product join between 3, 4 or 5 arrays which may also be > different lengths.. > > > > This message may contain information that is confidential or privileged. If > you are not the intended recipient, please advise the sender immediately and > delete this message. See > http://www.blackrock.com/corporate/compliance/email-disclaimers for further > information. Please refer to > http://www.blackrock.com/corporate/compliance/privacy-policy for more > information about BlackRock’s Privacy Policy. > > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/about-us/contacts-locations. > > © 2023 BlackRock, Inc. All rights reserved.
signature.asc
Description: OpenPGP digital signature
