How about ORC? I have experimented briefly with Parquet and ORC, and I
liked the fact that ORC has its schema within the file, which makes it
handy to work with any other tools.

Jong Wook

On 3 March 2016 at 23:29, Don Drake <dondr...@gmail.com> wrote:

> My tests show Parquet has better performance than Avro in just about every
> test.  It really shines when you are querying a subset of columns in a wide
> table.
>
> -Don
>
> On Wed, Mar 2, 2016 at 3:49 PM, Timothy Spann <tim.sp...@airisdata.com>
> wrote:
>
>> Which format is the best format for SparkSQL adhoc queries and general
>> data storage?
>>
>> There are lots of specialized cases, but generally accessing some but not
>> all the available columns with a reasonable subset of the data.
>>
>> I am learning towards Parquet as it has great support in Spark.
>>
>> I also have to consider any file on HDFS may be accessed from other tools
>> like Hive, Impala, HAWQ.
>>
>> Suggestions?
>> —
>> airis.DATA
>> Timothy Spann, Senior Solutions Architect
>> C: 609-250-5894
>> http://airisdata.com/
>> http://meetup.com/nj-datascience
>>
>>
>>
>
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake <http://www.MailLaunder.com/>
> 800-733-2143
>

Reply via email to