SPARK-2870: Thorough schema inference directly on RDDs of Python dictionaries <https://issues.apache.org/jira/browse/SPARK-2870>
On Tue, Aug 5, 2014 at 5:07 PM, Michael Armbrust <mich...@databricks.com> wrote: > Maybe; I’m not sure just yet. Basically, I’m looking for something >> functionally equivalent to this: >> >> sqlContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x))) >> >> In other words, given an RDD of JSON-serializable Python dictionaries, I >> want to be able to infer a schema that is guaranteed to cover the entire >> data set. With semi-structured data, it is rarely useful to infer schema by >> inspecting just one element. >> >> Does that sound like something we want to support? >> > Yes, I think that would be good. I'd open a JIRA. > >> >> > >