You need to refresh the external table manually after updating the data
source outside Spark SQL:
- via Scala API: sqlContext.refreshTable("table1")
- via SQL: REFRESH TABLE table1;
Cheng
On 4/4/15 5:24 PM, Rex Xiong wrote:
Hi Spark Users,
I'm testing 1.3 new feature of parquet partition discovery.
I have 2 sub folders, each has 800 rows.
/data/table1/key=1
/data/table1/key=2
In spark-shell, run this command:
val t = sqlContext.createExternalTable("table1",
"hdfs://xxxx/data/table1", "parquet")
t.count
It shows 1600 successfully.
But after that, I add a new folder /data/table1/key=3, then run
t.count again, it still gives me 1600, not 2400.
I try to restart spark-shell, then run
val t = sqlContext.table("table1")
t.count
It's 2400 now.
I'm wondering there should be a partition cache in driver, I try to
set spark.sql.parquet.cacheMetadata to false and test it
again, unfortunately it doesn't help.
How can I disable this partition cache or force refresh the cache?
Thanks