date:20201117

Disable parquet metadata for count

2020-11-17 Thread Gary Li

Hi all, I am implementing a custom data sourceV1 and would like to enforce a pushdown filter for every query. But when I run a simple count query df.count(), Spark will ignore the filter and use the metadata in the parquet footer to accumulate the count of each block directly, which will return

Can all the parameters of hive be used on spark sql?

2020-11-17 Thread Gang Li

eg: set hive.merge.smallfiles.avgsize=1600; SET hive.auto.convert.join = true; SET hive.exec.compress.intermediate=true; SET hive.exec.compress.output=true; SET hive.exec.parallel=true; thank you very much！！！ -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

RE: [EXTERNAL] Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

2020-11-17 Thread Rahul Potharaju

Both Terry and I will be at the upcoming Hyperspace talk at Spark+AI Europe Summit 2020 (in less than 7 hrs @ 3:35 AM PST!). Please say hi if you happen to drop by and/or ask us anything! 😊 Thank you! Rahul Potharaju From: Terry Kim Sent:

Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

2020-11-17 Thread Terry Kim

Hi, We are happy to announce that Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™ - has been just released ! Here are the some of the highlights: - Mutable dataset support: Hyperspace v0.3.0 supports mutable dataset w