Re: parquet late column materialization

2018-03-18 Thread nguyen duc Tuan
You can use EXPLAIN statement to see optimized plan for each query. ( https://stackoverflow.com/questions/35883620/spark-how-can-get-the-logical-physical-query-execution-using-thirft-hive ). 2018-03-19 0:52 GMT+07:00 CPC : > Hi nguyen, > > Thank you for quick response. But

Re: parquet late column materialization

2018-03-18 Thread CPC
Hi nguyen, Thank you for quick response. But what i am trying to understand is in both query predicate evolution require only one column. So actually spark does not need to read all column in projection if they are not used in filter predicate. Just to give an example, amazon redshift has this

Re: parquet late column materialization

2018-03-18 Thread nguyen duc Tuan
Hi @CPC, Parquet is column storage format, so if you want to read data from only one column, you can do that without accessing all of your data. Spark SQL consists of a query optimizer ( see https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html), so it will

parquet late column materialization

2018-03-18 Thread CPC
Hi everybody, I try to understand how spark reading parquet files but i am confused a little bit. I have a table with 4 columns and named businesskey,transactionname,request and response Request and response columns are huge columns(10-50kb). when i execute a query like "select * from mytable