You can pass an explicit schema in the ParseOptions, but I don't know if it will see "string" in the schema and promote integers (if not, could you open a Jira about this?). Otherwise I'm not sure that automatic "loose" type inference is a good default behavior (though possibly something that could be opted into).
On Mon, Mar 15, 2021 at 4:04 PM Pavol Knapek <[email protected]> wrote: > > Hi guys, > > I'm trying to use the `pyarrow.json.read_json('input.json')` command - to > load a JSON file, infer the schema, and return a new `pyarrow.Table` instance. > > So, given an input: > {"col1": "1"} > {"col1": 1} > > I'd expect the output `pyarrow.Table` to have a schema {col1: string}, with > an implicit cast of Integer(s) to String(s). > > (As it gets inferred in a similar way i.e. by Apache Spark) > > But instead, an exception gets raised: > ArrowInvalid: JSON parse error: Column(/col1) changed from string to number > in row 1 > > Is there some way to let the infer-process know it can safely cast all types > to a super-type, if possible (i.e. Integer -> String, Object -> String, > Anything -> String, ...)? > > Thanks > > Best > -- > Pavol Knapek > mobile CA: +1 604 314 6164 > mobile CZ: +420 774 293 243 > mobile SK: +421 917 557 263 > e-mail: [email protected] > http://linkedin.com/in/pavolknapek
