Disable parquet metadata for count

2020-11-17 Thread Gary Li
Hi all, I am implementing a custom data sourceV1 and would like to enforce a pushdown filter for every query. But when I run a simple count query df.count(), Spark will ignore the filter and use the metadata in the parquet footer to accumulate the count of each block directly, which will return

Can all the parameters of hive be used on spark sql?

2020-11-17 Thread Gang Li
eg: set hive.merge.smallfiles.avgsize=1600; SET hive.auto.convert.join = true; SET hive.exec.compress.intermediate=true; SET hive.exec.compress.output=true; SET hive.exec.parallel=true; thank you very much!!! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

RE: [EXTERNAL] Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

2020-11-17 Thread Rahul Potharaju
Both Terry and I will be at the upcoming Hyperspace talk at Spark+AI Europe Summit 2020 (in less than 7 hrs @ 3:35 AM PST!). Please say hi if you happen to drop by and/or ask us anything! 😊 Thank you! Rahul Potharaju From: Terry Kim Sent:

Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

2020-11-17 Thread Terry Kim
Hi, We are happy to announce that Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™ - has been just released ! Here are the some of the highlights: - Mutable dataset support: Hyperspace v0.3.0 supports mutable dataset w