Usually Parquet can be more efficient because of its columnar nature. Say your table has 10 columns but your join query only touches 3 of them, Parquet only reads those 3 columns from disk while Avro must load all data.

Cheng

On 6/5/15 3:00 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
We currently have data in avro format and we do joins between avro and sequence file data.
Will storing these datasets in Parquet make joins any faster ?

The dataset sizes are beyond are between 500 to 1000 GB.
--
Deepak



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to