I'll add there is a JIRA to override the default past some threshold of # of unique keys: https://issues.apache.org/jira/browse/SPARK-4476 <https://issues.apache.org/jira/browse/SPARK-4476>
On Fri, Jul 17, 2015 at 1:32 PM, Michael Armbrust <mich...@databricks.com> wrote: > The difference between a map and a struct here is that in a struct all > possible keys are defined as part of the schema and can each can have a > different type (and we don't support union types). JSON doesn't have > differentiated data structures so we go with the one that gives you more > information when doing inference by default. If you pass in a schema to > JSON however, you can override this and have a JSON object parsed as a map. > > On Fri, Jul 17, 2015 at 11:02 AM, Corey Nolet <cjno...@gmail.com> wrote: > >> I notice JSON objects are all parsed as Map[String,Any] in Jackson but >> for some reason, the "inferSchema" tools in Spark SQL extracts the schema >> of nested JSON objects as StructTypes. >> >> This makes it really confusing when trying to rectify the object >> hierarchy when I have maps because the Catalyst conversion layer underneath >> is expecting a Row or Product and not a Map. >> >> Why wasn't MapType used here? Is there any significant difference between >> the two of these types that would cause me not to use a MapType when I'm >> constructing my own schema representing a set of nested Map[String,_]'s? >> >> >> >> >