yeah, I don't really see anything that jumps out at me as being a clear 
solution. I'm also not sure that you would want that materialized unless your 
final result was reasonably small.

I don't see acero as having implemented crossrel[1], which would be exactly 
what you'd want.



My suggestion is essentially to build it yourself, potentially as a compute 
function. The only other general recommendations I can give would be to use 
dictionary encoding for string columns, maybe some clever use of run-length 
encoding, and/or to use generators since you're in python.


Definitely a lackluster answer, but if you would like more direction then 
sharing your requirements would be really useful.




[1]: https://substrait.io/relations/logical_relations/#cross-product-operation


On Thu, May 4, 2023 at 13:56, Lee, David (PAG) <[email protected]> wrote:

> I'm trying to construct a cartesian result as a pyarrow table using pyarrow 
> compute, but haven't found any elegant way to do this..
> 

> Any suggestions?
> 

> For inputs :
> 

> n_legs = pa.array([2, 4, 5, 100])
> animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
> names = ["n_legs", "animals"]
> 

> Desired output: 16 rows which is the product of 4 elements in n_legs x 4 
> elements in animals..
> 

> >>> final_table
> pyarrow.Table
> n_legs: int64
> animals: string
> ----
> n_legs: [[2,4,5,100,100,...,4,4,5,100,2]]
> animals: [["Flamingo","Horse","Brittle 
> stars","Centipede","Flamingo",...,"Centipede","Flamingo","Horse","Brittle 
> stars","Centipede"]]
> 

> >>> final_table.to_pylist()
> [{'n_legs': 2, 'animals': 'Flamingo'}, {'n_legs': 4, 'animals': 'Horse'}, 
> {'n_legs': 5, 'animals': 'Brittle stars'}, {'n_legs': 100, 'animals': 
> 'Centipede'}, {'n_legs': 100, 'animals': 'Flamingo'}, {'n_legs': 2, 
> 'animals': 'Horse'}, {'n_legs': 4, 'animals': 'Brittle stars'}, {'n_legs': 5, 
> 'animals': 'Centipede'}, {'n_legs': 100, 'animals': 'Flamingo'}, {'n_legs': 
> 5, 'animals': 'Horse'}, {'n_legs': 2, 'animals': 'Brittle stars'}, {'n_legs': 
> 4, 'animals': 'Centipede'}, {'n_legs': 4, 'animals': 'Flamingo'}, {'n_legs': 
> 5, 'animals': 'Horse'}, {'n_legs': 100, 'animals': 'Brittle stars'}, 
> {'n_legs': 2, 'animals': 'Centipede'}]
> 

> The above example is a cartesian join between two arrays, but potentially 
> this could a product join between 3, 4 or 5 arrays which may also be 
> different lengths..
> 

> 

> 

> This message may contain information that is confidential or privileged. If 
> you are not the intended recipient, please advise the sender immediately and 
> delete this message. See 
> http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
> information. Please refer to 
> http://www.blackrock.com/corporate/compliance/privacy-policy for more 
> information about BlackRock’s Privacy Policy.
> 

> 

> For a list of BlackRock's office addresses worldwide, see 
> http://www.blackrock.com/corporate/about-us/contacts-locations.
> 

> © 2023 BlackRock, Inc. All rights reserved.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to