Hi, I am new to SparkSQL.
I want to read the specified columns from the parquet, not all the columns defined in the parquet file. For instance, the schema of the parquet file would look like this: { "type": "record", "name": "ElectricPowerUsage", "namespace": "jcascalog.parquet.example", "fields": [ { "name": "addressCode", "type": [ "null", "string" ] }, { "name": "timestamp", "type": [ "null", "long" ] }, { "name": "devicePowerEventList", "type": { "type": "array", "items": { "type": "record", "name": "DevicePowerEvent", "fields": [ { "name": "power", "type": [ "null", "double" ] }, { "name": "deviceType", "type": [ "null", "int" ] }, { "name": "deviceId", "type": [ "null", "int" ] }, { "name": "status", "type": [ "null", "int" ] } ] } } } ] } To read just specified columns(addressCode, devicePowerEventList) from this parquet file, the following schema defines just addressCode, devicePowerEventList columns: { "type": "record", "name": "ElectricPowerUsage", "namespace": "jcascalog.parquet.example", "fields": [ { "name": "addressCode", "type": [ "null", "string" ] }, { "name": "devicePowerEventList", "type": { "type": "array", "items": { "type": "record", "name": "DevicePowerEvent", "fields": [ { "name": "power", "type": [ "null", "double" ] } ] } } } ] } I have not yet found from spark docs to handle this. - Kidong Lee. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-just-specified-columns-from-parquet-file-using-SparkSQL-tp15459.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org