I'll add there is a JIRA to override the default past some threshold of #
of unique keys: https://issues.apache.org/jira/browse/SPARK-4476
<https://issues.apache.org/jira/browse/SPARK-4476>

On Fri, Jul 17, 2015 at 1:32 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> The difference between a map and a struct here is that in a struct all
> possible keys are defined as part of the schema and can each can have a
> different type (and we don't support union types).  JSON doesn't have
> differentiated data structures so we go with the one that gives you more
> information when doing inference by default.  If you pass in a schema to
> JSON however, you can override this and have a JSON object parsed as a map.
>
> On Fri, Jul 17, 2015 at 11:02 AM, Corey Nolet <cjno...@gmail.com> wrote:
>
>> I notice JSON objects are all parsed as Map[String,Any] in Jackson but
>> for some reason, the "inferSchema" tools in Spark SQL extracts the schema
>> of nested JSON objects as StructTypes.
>>
>> This makes it really confusing when trying to rectify the object
>> hierarchy when I have maps because the Catalyst conversion layer underneath
>> is expecting a Row or Product and not a Map.
>>
>> Why wasn't MapType used here? Is there any significant difference between
>> the two of these types that would cause me not to use a MapType when I'm
>> constructing my own schema representing a set of nested Map[String,_]'s?
>>
>>
>>
>>
>

Reply via email to