It's https://issues.apache.org/jira/browse/ARROW-1489

On Sat, May 30, 2020 at 9:56 AM Neal Richardson
<[email protected]> wrote:
>
> Sounds reasonable, could you please open a JIRA issue?
>
> Neal
>
> On Sat, May 30, 2020 at 1:01 AM Yue Ni <[email protected]> wrote:
>>
>> Hi there,
>>
>> I find arrow compute provides Cast API allowing users to cast from string to 
>> number/boolean values, but sometimes the string values contain some invalid 
>> values that cannot be casted to a number/boolean (sorry, data is really 
>> messy), for example, in a string array like ["1", "2", "3", "None", ""]. I 
>> wonder if there is any way to handle those invalid values during casting.
>>
>> Currently from the code I read (cast.h/cast.cc), it seems the cast will fail 
>> and return when dealing with invalid values, I wonder if there is any way I 
>> can ask the Cast API to return NULL for invalid values, so that it is easier 
>> to process these NULL values later.
>>
>> And since it is rarely possible to guarantee all string values in an array 
>> are valid, **any** invalid value in an array/entire data set will make the 
>> cast process failed. This requires users using the cast API to figure out 
>> which value in the array has the invalid value by themself, which is not 
>> easy to do programmatically (only an error status message is set in the 
>> context). IMHO the following strategy could be a better default strategy 
>> when casting from string to number/boolean:
>> 1) when finding an invalid value, set NULL as its value
>> 2) set an error status indicating this array casting has some invalid values
>> 3) keep finish casting the remaining elements in the array
>> But I believe there are users who prefer bailing out as soon as possible as 
>> well, it will be great if we can provide different cast options to make both 
>> strategies possible.
>>
>> Thanks so much.
>>
>> Regards,
>> Yue

Reply via email to