If you already know the schema, then you can run the read with the schema 
parameter like this:


val path = "examples/src/main/resources/jsonfile"

val jsonSchema =  StructType(
        StructField("id",StringType,true) ::
        StructField("reference",LongType,true) ::
        StructField("details",detailsSchema, true) ::
        StructField("value",StringType,true) ::Nil)

val people = sqlContext.read.schema(jsonSchema).json(path)
If you have the schema defined as a separate small JSON file, then you can load 
it by running something like this line to load it directly:

val jsonSchema = sqlContext.read.json(“path/to/schema”).schema

Thanks,
Ewan

From: Gavin Yue [mailto:yue.yuany...@gmail.com]
Sent: 06 January 2016 07:14
To: user <user@spark.apache.org>
Subject: How to accelerate reading json file?

I am trying to read json files following the example:

val path = "examples/src/main/resources/jsonfile"

val people = sqlContext.read.json(path)

I have 1 Tb size files in the path.  It took 1.2 hours to finish the reading to 
infer the schema.

But I already know the schema. Could I make this process short?

Thanks a lot.



Reply via email to