Maciej Bryński created SPARK-11356:
--------------------------------------

             Summary: Option to refresh information about partitions
                 Key: SPARK-11356
                 URL: https://issues.apache.org/jira/browse/SPARK-11356
             Project: Spark
          Issue Type: Improvement
            Reporter: Maciej Bryński


I have two apps:
1) First one periodically append data to parquet (which creates new partition)
2) Second one executes query on data

Right now I can't find any possibility to force Spark to make partition 
discovery. So every query is executed on the same data.
I tried --conf spark.sql.parquet.cacheMetadata=false but without success.

Is there any option to make this happen ?


App 1 - periodically (eg. every hour)
{code}
df.write.partitionBy("day").mode("append").parquet("some_location")
{code}


App 2 
{code}
sqlContext.read.parquet("some_location").registerTempTable("t")
sqlContext.sql("select * from t where day = 20151027").count()
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to