Sounds reasonable, could you please open a JIRA issue? Neal
On Sat, May 30, 2020 at 1:01 AM Yue Ni <[email protected]> wrote: > Hi there, > > I find arrow compute provides Cast API allowing users to cast from string > to number/boolean values, but sometimes the string values contain some > invalid values that cannot be casted to a number/boolean (sorry, data is > really messy), for example, in a string array like ["1", "2", "3", "None", > ""]. I wonder if there is any way to handle those invalid values during > casting. > > Currently from the code I read (cast.h/cast.cc), it seems the cast will > fail and return when dealing with invalid values, I wonder if there is any > way I can ask the Cast API to return NULL for invalid values, so that it is > easier to process these NULL values later. > > And since it is rarely possible to guarantee all string values in an array > are valid, **any** invalid value in an array/entire data set will make the > cast process failed. This requires users using the cast API to figure out > which value in the array has the invalid value by themself, which is not > easy to do programmatically (only an error status message is set in the > context). IMHO the following strategy could be a better default strategy > when casting from string to number/boolean: > 1) when finding an invalid value, set NULL as its value > 2) set an error status indicating this array casting has some invalid > values > 3) keep finish casting the remaining elements in the array > But I believe there are users who prefer bailing out as soon as possible > as well, it will be great if we can provide different cast options to make > both strategies possible. > > Thanks so much. > > Regards, > Yue >
