Re: Implicit cast in PyArrow JSON schema inference (i.e. Integer -> String)

Wes McKinney Mon, 15 Mar 2021 16:40:27 -0700

You can pass an explicit schema in the ParseOptions, but I don't know
if it will see "string" in the schema and promote integers (if not,
could you open a Jira about this?). Otherwise I'm not sure that
automatic "loose" type inference is a good default behavior (though
possibly something that could be opted into).


On Mon, Mar 15, 2021 at 4:04 PM Pavol Knapek <[email protected]> wrote:
>
> Hi guys,
>
> I'm trying to use the `pyarrow.json.read_json('input.json')` command - to 
> load a JSON file, infer the schema, and return a new `pyarrow.Table` instance.
>
> So, given an input:
> {"col1": "1"}
> {"col1": 1}
>
> I'd expect the output `pyarrow.Table` to have a schema {col1: string}, with 
> an implicit cast of Integer(s) to String(s).
>
> (As it gets inferred in a similar way i.e. by Apache Spark)
>
> But instead, an exception gets raised:
> ArrowInvalid: JSON parse error: Column(/col1) changed from string to number 
> in row 1
>
> Is there some way to let the infer-process know it can safely cast all types 
> to a super-type, if possible (i.e. Integer -> String, Object -> String, 
> Anything -> String, ...)?
>
> Thanks
>
> Best
> --
> Pavol Knapek
> mobile CA: +1 604 314 6164
> mobile CZ: +420 774 293 243
> mobile SK: +421 917 557 263
> e-mail: [email protected]
> http://linkedin.com/in/pavolknapek

Re: Implicit cast in PyArrow JSON schema inference (i.e. Integer -> String)

Reply via email to