Please take a look at http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Particularly the note at the required format : Note that the file that is offered as *a json file* is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail. On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <j...@jgp.net> wrote: > Hi folks, > > I am trying to parse JSON arrays and it’s getting a little crazy (for me > at least)… > > 1) > If my JSON is: > {"vals":[100,500,600,700,800,200,900,300]} > > I get: > +--------------------+ > | vals| > +--------------------+ > |[100, 500, 600, 7...| > +--------------------+ > > root > |-- vals: array (nullable = true) > | |-- element: long (containsNull = true) > > and I am :) > > 2) > If my JSON is: > [100,500,600,700,800,200,900,300] > > I get: > +--------------------+ > | _corrupt_record| > +--------------------+ > |[100,500,600,700,...| > +--------------------+ > > root > |-- _corrupt_record: string (nullable = true) > > Both are legit JSON structures… Do you think that #2 is a bug? > > jg > > > > > > -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/