Re: Question on Spark SQL for a directory

Michael Armbrust Tue, 21 Jul 2015 16:39:59 -0700

https://spark.apache.org/docs/latest/sql-programming-guide.html#loading-data-programmatically


On Tue, Jul 21, 2015 at 4:06 PM, Ron Gonzalez <zlgonza...@yahoo.com.invalid>
wrote:

> Hi,
>   Question on using spark sql.
>   Can someone give an example for creating table from a directory
> containing parquet files in HDFS instead of an actual parquet file?
>
> Thanks,
> Ron
>
> On 07/21/2015 01:59 PM, Brandon White wrote:
>
>> A few questions about caching a table in Spark SQL.
>>
>> 1) Is there any difference between caching the dataframe and the table?
>>
>> df.cache() vs sqlContext.cacheTable("tableName")
>>
>> 2) Do you need to "warm up" the cache before seeing the performance
>> benefits? Is the cache LRU? Do you need to run some queries on the table
>> before it is cached in memory?
>>
>> 3) Is caching the table much faster than .saveAsTable? I am only seeing a
>> 10 %- 20% performance increase.
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Question on Spark SQL for a directory

Reply via email to