Should have also mentioned that the following works:

pc.case_when(cond, *[*map.values()])

It seems the dict values don’t unpack directly, but unpacking them in a list 
first will work.

> On Nov 15, 2022, at 8:57 AM, Ryan Kuhns <[email protected]> wrote:
> 
> Hi joris,
> 
> I appreciate the ticket. I like proposed functionality (and use of keyword to 
> move between map and replace functionality).
> 
> I appreciate the help figuring out the use of make_struct. I got an error on 
> the values portion when using unpacking. In pyarrow the following works:
> 
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> 
>>> map = {“a”: 1, “b”: 2, “c”: 3}
>>> input_array = pa.array([“a”, “b”, “c”, “a”])
>>> expected_output  = pa.array([1, 2, 3, 1])
> 
>>>>> 
>>>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in map.keys()])
>>>>> pc.case_when(cond, 1, 2, 3)
>>>>> 
> 
> Thanks,
> 
> Ryan
> 
>> On Nov 15, 2022, at 2:33 AM, Joris Van den Bossche 
>> <[email protected]> wrote:
>> 
>> And as an answer to how you can use pyarrow.compute.case_when for this:
>> 
>>>>> map = {"a": 1, "b": 2, "c": 3}
>>>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in map.keys()])
>>>>> pc.case_when(cond, *map.values())
>> <pyarrow.lib.Int64Array object at 0x7f44a99f32e0>
>> [
>> 1,
>> 2,
>> 3,
>> 1
>> ]
>> 
>> The "case_when" compute function takes the multiple conditions as a
>> StructArray, which you can compose using the "make_struct" compute
>> function.
>> It's certainly not the most user friendly or obvious way, so we should
>> certainly add some examples to the docstring on how to achieve this.
>> 
>> Also, for this specific case where you already having this "mapping"
>> of values you want to replace, I think we should have a specialized
>> kernel, avoiding the need to materialize a boolean array for each
>> value -> https://issues.apache.org/jira/browse/ARROW-10641
>> 
>> Joris
>> 
>> 
>>>> On Mon, 14 Nov 2022 at 19:51, Ryan Kuhns <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> I’ve got one more question as a follow up to my prior question on working 
>>> with multi-file zipped CSVs. [1] Figured it was worth asking in another 
>>> thread so it would be easier for others to see specific question about 
>>> case_when.
>>> 
>>> I’m trying to accomplish something like pandas DataFrame.Series.map where I 
>>> map values of a arrow array to a new value.
>>> 
>>> pyarrow.compute.case_when looks like a candidate to solve this, but after 
>>> reading the docs, I’m still not clear on how to structure the argument to 
>>> the “cond” parameter or if there is alternative functionality that would be 
>>> better.
>>> 
>>> Example input, mapping and expected output:
>>> 
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> 
>>> map = {“a”: 1, “b”: 2, “c”: 3}
>>> input_array = pa.array([“a”, “b”, “c”, “a”])
>>> expected_output  = pa.array([1, 2, 3, 1])
>>> 
>>> Logic I’m hoping for would be the equivalent of the following SQL:
>>> 
>>> Case
>>>   when input_array = “a” then 1
>>>   when input_array = “b” then 2
>>>   when input_array = “c” then 3
>>>   else input_array
>>> End
>>> 
>>> Or alternatively, if input array was a a pandas Series then 
>>> input_array.map(map).
>>> 
>>> Thanks again,
>>> 
>>> Ryan
>>> 
>>> 
>>> 
>>> 
>>> 
>>> [1] https://www.mail-archive.com/[email protected]/msg02379.html

Reply via email to