SPARK-2870: Thorough schema inference directly on RDDs of Python
dictionaries <https://issues.apache.org/jira/browse/SPARK-2870>


On Tue, Aug 5, 2014 at 5:07 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> Maybe; I’m not sure just yet. Basically, I’m looking for something
>> functionally equivalent to this:
>>
>> sqlContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
>>
>> In other words, given an RDD of JSON-serializable Python dictionaries, I
>> want to be able to infer a schema that is guaranteed to cover the entire
>> data set. With semi-structured data, it is rarely useful to infer schema by
>> inspecting just one element.
>>
>> Does that sound like something we want to support?
>>
> Yes, I think that would be good.  I'd open a JIRA.
>
>>  ​
>>
>
>

Reply via email to