Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile("people.parquet")
There is no way to specify that I am interested in reading only some columns from disk. For example, If the parquet file has 10 columns and want to read only 3 columns from disk. We have done an experiment - Table1 - Parquet file containing 10 columns Table2 - Parquet file containing only 3 columns which were used in query The time taken by query on table1 and table2 shows huge difference. Query on Table1 takes more than double of time taken on table2 which makes me think that spark is reading all the columns from disk in case of table1 when it needs only 3 columns. How should I make sure that it reads only 3 of 10 columns from disk ? Regards, Ajay