Hi David, Thanks for the idea. I had been taking a similar approach earlier, but I’m working with data bigger than memory so I want to include this in a projection on read to batches. Makes sense as a approach for others.
-Ryan > On Nov 15, 2022, at 4:11 AM, Lee, David <[email protected]> wrote: > > > You can always turn these structures into tables and do sql like joins. > > map = {“a”: 1, “b”: 2, “c”: 3} > input_array = pa.array([“a”, “b”, “c”, “a”]) > > map_table = pa.Table.from_pylist([{"key": k, "value": v} for k, v in > map.items()]) > input_table = pa.Table.from_arrays([input_array], names=["key"]) > > joined_result = input_table.join(map_table, "key") > >>>> joined_result > pyarrow.Table > key: string > value: int64 > ---- > key: [["a","b","c","a"]] > value: [[1,2,3,1]] > > -----Original Message----- > From: Joris Van den Bossche <[email protected]> > Sent: Monday, November 14, 2022 11:33 PM > To: [email protected] > Subject: Re: pyarrow.compute case_when > > External Email: Use caution with links and attachments > > > And as an answer to how you can use pyarrow.compute.case_when for this: > >>>> map = {"a": 1, "b": 2, "c": 3} >>>> cond = pc.make_struct(*[pc.equal(input_array, val) for val in >>>> map.keys()]) pc.case_when(cond, *map.values()) > <pyarrow.lib.Int64Array object at 0x7f44a99f32e0> [ > 1, > 2, > 3, > 1 > ] > > The "case_when" compute function takes the multiple conditions as a > StructArray, which you can compose using the "make_struct" compute function. > It's certainly not the most user friendly or obvious way, so we should > certainly add some examples to the docstring on how to achieve this. > > Also, for this specific case where you already having this "mapping" > of values you want to replace, I think we should have a specialized kernel, > avoiding the need to materialize a boolean array for each value -> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ARROW-10641__;!!KSjYCgUGsB4!ZIFs661Vs0zZZSbl7Ap0B_swBkooVbpHBiZavkMXQfUANYdmlbAd318opypAMkNY-O4rKPOTHVjfVdYvuuIgL0qQhP6PYA$ > > Joris > > >> On Mon, 14 Nov 2022 at 19:51, Ryan Kuhns <[email protected]> wrote: >> >> Hi, >> >> I’ve got one more question as a follow up to my prior question on working >> with multi-file zipped CSVs. [1] Figured it was worth asking in another >> thread so it would be easier for others to see specific question about >> case_when. >> >> I’m trying to accomplish something like pandas DataFrame.Series.map where I >> map values of a arrow array to a new value. >> >> pyarrow.compute.case_when looks like a candidate to solve this, but after >> reading the docs, I’m still not clear on how to structure the argument to >> the “cond” parameter or if there is alternative functionality that would be >> better. >> >> Example input, mapping and expected output: >> >> import pyarrow as pa >> import pyarrow.compute as pc >> >> map = {“a”: 1, “b”: 2, “c”: 3} >> input_array = pa.array([“a”, “b”, “c”, “a”]) expected_output = >> pa.array([1, 2, 3, 1]) >> >> Logic I’m hoping for would be the equivalent of the following SQL: >> >> Case >> when input_array = “a” then 1 >> when input_array = “b” then 2 >> when input_array = “c” then 3 >> else input_array >> End >> >> Or alternatively, if input array was a a pandas Series then >> input_array.map(map). >> >> Thanks again, >> >> Ryan >> >> >> >> >> >> [1] >> https://urldefense.com/v3/__https://www.mail-archive.com/[email protected] >> rrow.org/msg02379.html__;!!KSjYCgUGsB4!ZIFs661Vs0zZZSbl7Ap0B_swBkooVbp >> HBiZavkMXQfUANYdmlbAd318opypAMkNY-O4rKPOTHVjfVdYvuuIgL0offUCiSw$ > > > This message may contain information that is confidential or privileged. If > you are not the intended recipient, please advise the sender immediately and > delete this message. See > http://www.blackrock.com/corporate/compliance/email-disclaimers for further > information. Please refer to > http://www.blackrock.com/corporate/compliance/privacy-policy for more > information about BlackRock’s Privacy Policy. > > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/about-us/contacts-locations. > > © 2022 BlackRock, Inc. All rights reserved.
