Spark 2.1 - Infering schema of dataframe after reading json files not during

Aseem Bansal Fri, 02 Jun 2017 07:12:59 -0700

When we read files in spark it infers the schema. We have the option to not
infer the schema. Is there a way to ask spark to infer the schema again
just like when reading json?


The reason we want to get this done is because we have a problem in our
data files. We have a json file containing this

{"a": NESTED_JSON_VALUE}
{"a":"null"}

It should have been empty json but due to a bug it became "null" instead.
Now, when we read the file the column "a" is considered as a String.
Instead what we want to do is ask spark to read the file considering "a" as
a String, filter the "null" out/replace with empty json and then ask spark
to infer schema of "a" after the fix so we can access the nested json
properly.

Spark 2.1 - Infering schema of dataframe after reading json files not during

Reply via email to