When we read files in spark it infers the schema. We have the option to not
infer the schema. Is there a way to ask spark to infer the schema again
just like when reading json?

The reason we want to get this done is because we have a problem in our
data files. We have a json file containing this

{"a": NESTED_JSON_VALUE}
{"a":"null"}

It should have been empty json but due to a bug it became "null" instead.
Now, when we read the file the column "a" is considered as a String.
Instead what we want to do is ask spark to read the file considering "a" as
a String, filter the "null" out/replace with empty json and then ask spark
to infer schema of "a" after the fix so we can access the nested json
properly.

Reply via email to