Hi David,

Thanks for the idea. I had been taking a similar approach earlier, but I’m 
working with data bigger than memory so I want to include this in a projection 
on read to batches. Makes sense as a approach for others.

-Ryan



> On Nov 15, 2022, at 4:11 AM, Lee, David <[email protected]> wrote:
> 
> 
> You can always turn these structures into tables and do sql like joins.
> 
> map = {“a”: 1, “b”: 2, “c”: 3}
> input_array = pa.array([“a”, “b”, “c”, “a”])
> 
> map_table = pa.Table.from_pylist([{"key": k, "value": v} for k, v in 
> map.items()])
> input_table = pa.Table.from_arrays([input_array], names=["key"])
> 
> joined_result = input_table.join(map_table, "key")
> 
>>>> joined_result
> pyarrow.Table
> key: string
> value: int64
> ----
> key: [["a","b","c","a"]]
> value: [[1,2,3,1]]
> 
> -----Original Message-----
> From: Joris Van den Bossche <[email protected]> 
> Sent: Monday, November 14, 2022 11:33 PM
> To: [email protected]
> Subject: Re: pyarrow.compute case_when
> 
> External Email: Use caution with links and attachments
> 
> 
> And as an answer to how you can use pyarrow.compute.case_when for this:
> 
>>>> map = {"a": 1, "b": 2, "c": 3}
>>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in 
>>>> map.keys()]) pc.case_when(cond, *map.values())
> <pyarrow.lib.Int64Array object at 0x7f44a99f32e0> [
>  1,
>  2,
>  3,
>  1
> ]
> 
> The "case_when" compute function takes the multiple conditions as a 
> StructArray, which you can compose using the "make_struct" compute function.
> It's certainly not the most user friendly or obvious way, so we should 
> certainly add some examples to the docstring on how to achieve this.
> 
> Also, for this specific case where you already having this "mapping"
> of values you want to replace, I think we should have a specialized kernel, 
> avoiding the need to materialize a boolean array for each value -> 
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ARROW-10641__;!!KSjYCgUGsB4!ZIFs661Vs0zZZSbl7Ap0B_swBkooVbpHBiZavkMXQfUANYdmlbAd318opypAMkNY-O4rKPOTHVjfVdYvuuIgL0qQhP6PYA$
> 
> Joris
> 
> 
>> On Mon, 14 Nov 2022 at 19:51, Ryan Kuhns <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I’ve got one more question as a follow up to my prior question on working 
>> with multi-file zipped CSVs. [1] Figured it was worth asking in another 
>> thread so it would be easier for others to see specific question about 
>> case_when.
>> 
>> I’m trying to accomplish something like pandas DataFrame.Series.map where I 
>> map values of a arrow array to a new value.
>> 
>> pyarrow.compute.case_when looks like a candidate to solve this, but after 
>> reading the docs, I’m still not clear on how to structure the argument to 
>> the “cond” parameter or if there is alternative functionality that would be 
>> better.
>> 
>> Example input, mapping and expected output:
>> 
>> import pyarrow as pa
>> import pyarrow.compute as pc
>> 
>> map = {“a”: 1, “b”: 2, “c”: 3}
>> input_array = pa.array([“a”, “b”, “c”, “a”]) expected_output  = 
>> pa.array([1, 2, 3, 1])
>> 
>> Logic I’m hoping for would be the equivalent of the following SQL:
>> 
>> Case
>>    when input_array = “a” then 1
>>    when input_array = “b” then 2
>>    when input_array = “c” then 3
>>    else input_array
>> End
>> 
>> Or alternatively, if input array was a a pandas Series then 
>> input_array.map(map).
>> 
>> Thanks again,
>> 
>> Ryan
>> 
>> 
>> 
>> 
>> 
>> [1] 
>> https://urldefense.com/v3/__https://www.mail-archive.com/[email protected]
>> rrow.org/msg02379.html__;!!KSjYCgUGsB4!ZIFs661Vs0zZZSbl7Ap0B_swBkooVbp
>> HBiZavkMXQfUANYdmlbAd318opypAMkNY-O4rKPOTHVjfVdYvuuIgL0offUCiSw$
> 
> 
> This message may contain information that is confidential or privileged. If 
> you are not the intended recipient, please advise the sender immediately and 
> delete this message. See 
> http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
> information.  Please refer to 
> http://www.blackrock.com/corporate/compliance/privacy-policy for more 
> information about BlackRock’s Privacy Policy.
> 
> 
> For a list of BlackRock's office addresses worldwide, see 
> http://www.blackrock.com/corporate/about-us/contacts-locations.
> 
> © 2022 BlackRock, Inc. All rights reserved.

Reply via email to