You need to refresh the external table manually after updating the data source outside Spark SQL:

- via Scala API: sqlContext.refreshTable("table1")
- via SQL: REFRESH TABLE table1;

Cheng

On 4/4/15 5:24 PM, Rex Xiong wrote:
Hi Spark Users,

I'm testing 1.3 new feature of parquet partition discovery.
I have 2 sub folders, each has 800 rows.
/data/table1/key=1
/data/table1/key=2

In spark-shell, run this command:

val t = sqlContext.createExternalTable("table1", "hdfs://xxxx/data/table1", "parquet")

t.count


It shows 1600 successfully.

But after that, I add a new folder /data/table1/key=3, then run t.count again, it still gives me 1600, not 2400.


I try to restart spark-shell, then run

val t = sqlContext.table("table1")

t.count


It's 2400 now.


I'm wondering there should be a partition cache in driver, I try to set spark.sql.parquet.cacheMetadata to false and test it again, unfortunately it doesn't help.


How can I disable this partition cache or force refresh the cache?


Thanks


Reply via email to