Use spark.sql.types.ArrayType instead of a Scala Array as the root type when you define the schema and it will work.
Regards, Magnus On Fri, Feb 22, 2019 at 11:15 PM Yeikel <em...@yeikel.com> wrote: > I have an "unnamed" json array stored in a *column*. > > The format is the following : > > column name : news > > Data : > > [ > { > "source": "source1", > "name": "News site1" > }, > { > "source": "source2", > "name": "News site2" > } > ] > > > Ideally , I'd like to parse it as : > > news ARRAY<struct<source:string, name:string>> > > I've tried the following : > > import org.apache.spark.sql.Encoders > import org.apache.spark.sql.types._; > > val entry = scala.io.Source.fromFile("1.txt").mkString > > val ds = Seq(entry).toDF("news") > > val schema = Array(new StructType().add("name", StringType).add("source", > StringType)) > > ds.select(from_json($"news", schema) as "news_parsed").show(false) > > But this is not allowed : > > found : Array[org.apache.spark.sql.types.StructType] > required: org.apache.spark.sql.types.StructType > > > I also tried passing the following schema : > > val schema = StructType(new StructType().add("name", > StringType).add("source", StringType)) > > But this only parsed the first record : > > +--------------------+ > |news_parsed | > +--------------------+ > |[News site1,source1]| > +--------------------+ > > > I am aware that if I fix the JSON like this : > > { > "news": [ > { > "source": "source1", > "name": "News site1" > }, > { > "source": "source2", > "name": "News site2" > } > ] > } > > The parsing works as expected , but I would like to avoid doing that if > possible. > > Another approach that I can think of is to map on it and parse it using > third party libraries like Gson , but I am not sure if this is any better > than fixing the json beforehand. > > I am running Spark 2.1 > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >