How about ORC? I have experimented briefly with Parquet and ORC, and I liked the fact that ORC has its schema within the file, which makes it handy to work with any other tools.
Jong Wook On 3 March 2016 at 23:29, Don Drake <dondr...@gmail.com> wrote: > My tests show Parquet has better performance than Avro in just about every > test. It really shines when you are querying a subset of columns in a wide > table. > > -Don > > On Wed, Mar 2, 2016 at 3:49 PM, Timothy Spann <tim.sp...@airisdata.com> > wrote: > >> Which format is the best format for SparkSQL adhoc queries and general >> data storage? >> >> There are lots of specialized cases, but generally accessing some but not >> all the available columns with a reasonable subset of the data. >> >> I am learning towards Parquet as it has great support in Spark. >> >> I also have to consider any file on HDFS may be accessed from other tools >> like Hive, Impala, HAWQ. >> >> Suggestions? >> — >> airis.DATA >> Timothy Spann, Senior Solutions Architect >> C: 609-250-5894 >> http://airisdata.com/ >> http://meetup.com/nj-datascience >> >> >> > > > -- > Donald Drake > Drake Consulting > http://www.drakeconsulting.com/ > https://twitter.com/dondrake <http://www.MailLaunder.com/> > 800-733-2143 >