Use spark.sql.types.ArrayType instead of a Scala Array as the root type
when you define the schema and it will work.

Regards,

Magnus

On Fri, Feb 22, 2019 at 11:15 PM Yeikel <em...@yeikel.com> wrote:

> I have an "unnamed" json array stored in a *column*.
>
> The format is the following :
>
> column name : news
>
> Data :
>
> [
>   {
>     "source": "source1",
>     "name": "News site1"
>   },
>    {
>     "source": "source2",
>     "name": "News site2"
>   }
> ]
>
>
> Ideally , I'd like to parse it as :
>
> news ARRAY<struct&lt;source:string, name:string>>
>
> I've tried the following :
>
> import org.apache.spark.sql.Encoders
> import org.apache.spark.sql.types._;
>
> val entry = scala.io.Source.fromFile("1.txt").mkString
>
> val ds = Seq(entry).toDF("news")
>
> val schema = Array(new StructType().add("name", StringType).add("source",
> StringType))
>
> ds.select(from_json($"news", schema) as "news_parsed").show(false)
>
> But this is not allowed :
>
> found   : Array[org.apache.spark.sql.types.StructType]
> required: org.apache.spark.sql.types.StructType
>
>
> I also tried passing the following schema :
>
> val schema = StructType(new StructType().add("name",
> StringType).add("source", StringType))
>
> But this only parsed the first record :
>
> +--------------------+
> |news_parsed         |
> +--------------------+
> |[News site1,source1]|
> +--------------------+
>
>
> I am aware that if I fix the JSON like this :
>
> {
>   "news": [
>     {
>       "source": "source1",
>       "name": "News site1"
>     },
>     {
>       "source": "source2",
>       "name": "News site2"
>     }
>   ]
> }
>
> The parsing works as expected , but I would like to avoid doing that if
> possible.
>
> Another approach that I can think of is to map on it and parse it using
> third party libraries like Gson , but  I am not sure if this is any better
> than fixing the json beforehand.
>
> I am running Spark 2.1
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to