And as an answer to how you can use pyarrow.compute.case_when for this:
>>> map = {"a": 1, "b": 2, "c": 3}
>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in map.keys()])
>>> pc.case_when(cond, *map.values())
<pyarrow.lib.Int64Array object at 0x7f44a99f32e0>
[
1,
2,
3,
1
]
The "case_when" compute function takes the multiple conditions as a
StructArray, which you can compose using the "make_struct" compute
function.
It's certainly not the most user friendly or obvious way, so we should
certainly add some examples to the docstring on how to achieve this.
Also, for this specific case where you already having this "mapping"
of values you want to replace, I think we should have a specialized
kernel, avoiding the need to materialize a boolean array for each
value -> https://issues.apache.org/jira/browse/ARROW-10641
Joris
On Mon, 14 Nov 2022 at 19:51, Ryan Kuhns <[email protected]> wrote:
>
> Hi,
>
> I’ve got one more question as a follow up to my prior question on working
> with multi-file zipped CSVs. [1] Figured it was worth asking in another
> thread so it would be easier for others to see specific question about
> case_when.
>
> I’m trying to accomplish something like pandas DataFrame.Series.map where I
> map values of a arrow array to a new value.
>
> pyarrow.compute.case_when looks like a candidate to solve this, but after
> reading the docs, I’m still not clear on how to structure the argument to the
> “cond” parameter or if there is alternative functionality that would be
> better.
>
> Example input, mapping and expected output:
>
> import pyarrow as pa
> import pyarrow.compute as pc
>
> map = {“a”: 1, “b”: 2, “c”: 3}
> input_array = pa.array([“a”, “b”, “c”, “a”])
> expected_output = pa.array([1, 2, 3, 1])
>
> Logic I’m hoping for would be the equivalent of the following SQL:
>
> Case
> when input_array = “a” then 1
> when input_array = “b” then 2
> when input_array = “c” then 3
> else input_array
> End
>
> Or alternatively, if input array was a a pandas Series then
> input_array.map(map).
>
> Thanks again,
>
> Ryan
>
>
>
>
>
> [1] https://www.mail-archive.com/[email protected]/msg02379.html