Usually Parquet can be more efficient because of its columnar nature.
Say your table has 10 columns but your join query only touches 3 of
them, Parquet only reads those 3 columns from disk while Avro must load
all data.
Cheng
On 6/5/15 3:00 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
We currently have data in avro format and we do joins between avro and
sequence file data.
Will storing these datasets in Parquet make joins any faster ?
The dataset sizes are beyond are between 500 to 1000 GB.
--
Deepak
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org