Should have also mentioned that the following works: pc.case_when(cond, *[*map.values()])
It seems the dict values don’t unpack directly, but unpacking them in a list first will work. > On Nov 15, 2022, at 8:57 AM, Ryan Kuhns <[email protected]> wrote: > > Hi joris, > > I appreciate the ticket. I like proposed functionality (and use of keyword to > move between map and replace functionality). > > I appreciate the help figuring out the use of make_struct. I got an error on > the values portion when using unpacking. In pyarrow the following works: > >>> import pyarrow as pa >>> import pyarrow.compute as pc >>> >>> map = {“a”: 1, “b”: 2, “c”: 3} >>> input_array = pa.array([“a”, “b”, “c”, “a”]) >>> expected_output = pa.array([1, 2, 3, 1]) > >>>>> >>>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in map.keys()]) >>>>> pc.case_when(cond, 1, 2, 3) >>>>> > > Thanks, > > Ryan > >> On Nov 15, 2022, at 2:33 AM, Joris Van den Bossche >> <[email protected]> wrote: >> >> And as an answer to how you can use pyarrow.compute.case_when for this: >> >>>>> map = {"a": 1, "b": 2, "c": 3} >>>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in map.keys()]) >>>>> pc.case_when(cond, *map.values()) >> <pyarrow.lib.Int64Array object at 0x7f44a99f32e0> >> [ >> 1, >> 2, >> 3, >> 1 >> ] >> >> The "case_when" compute function takes the multiple conditions as a >> StructArray, which you can compose using the "make_struct" compute >> function. >> It's certainly not the most user friendly or obvious way, so we should >> certainly add some examples to the docstring on how to achieve this. >> >> Also, for this specific case where you already having this "mapping" >> of values you want to replace, I think we should have a specialized >> kernel, avoiding the need to materialize a boolean array for each >> value -> https://issues.apache.org/jira/browse/ARROW-10641 >> >> Joris >> >> >>>> On Mon, 14 Nov 2022 at 19:51, Ryan Kuhns <[email protected]> wrote: >>> >>> Hi, >>> >>> I’ve got one more question as a follow up to my prior question on working >>> with multi-file zipped CSVs. [1] Figured it was worth asking in another >>> thread so it would be easier for others to see specific question about >>> case_when. >>> >>> I’m trying to accomplish something like pandas DataFrame.Series.map where I >>> map values of a arrow array to a new value. >>> >>> pyarrow.compute.case_when looks like a candidate to solve this, but after >>> reading the docs, I’m still not clear on how to structure the argument to >>> the “cond” parameter or if there is alternative functionality that would be >>> better. >>> >>> Example input, mapping and expected output: >>> >>> import pyarrow as pa >>> import pyarrow.compute as pc >>> >>> map = {“a”: 1, “b”: 2, “c”: 3} >>> input_array = pa.array([“a”, “b”, “c”, “a”]) >>> expected_output = pa.array([1, 2, 3, 1]) >>> >>> Logic I’m hoping for would be the equivalent of the following SQL: >>> >>> Case >>> when input_array = “a” then 1 >>> when input_array = “b” then 2 >>> when input_array = “c” then 3 >>> else input_array >>> End >>> >>> Or alternatively, if input array was a a pandas Series then >>> input_array.map(map). >>> >>> Thanks again, >>> >>> Ryan >>> >>> >>> >>> >>> >>> [1] https://www.mail-archive.com/[email protected]/msg02379.html
