Re: [DISCUSS] Incremental statistics collection

2023-08-28 Thread Jia Fan
For those databases with automatic deduplication capabilities, such as hbase, we have inserted 100 rows with the same rowkey, but in fact there is only one in hbase. Is the new statistical value we added 100 or 1, or hbase already contains this rowkey, the value would be 0. How should we handle

Re: [DISCUSS] Incremental statistics collection

2023-08-28 Thread Mich Talebzadeh
I have never been fond of the notion that measuring inserts, updates, and deletes (referred to as DML) is the sole criterion for signaling a necessity to update statistics for Spark's CBO. Nevertheless, in the absence of an alternative mechanism, it seems this is the only approach at our disposal

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-28 Thread Mich Talebzadeh
Thanks Qian for your feedback. I will have a look Regards, Mich view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

Re: Spark Connect: API mismatch in SparkSesession#execute

2023-08-28 Thread Martin Grund
Hi Stefan, There are some current limitations around how protobuf is embedded in Spark Connect. One of the challenges there is that for compatibility reasons we currently shade protobuf that then shades the `prototobuf.GeneramtedMessage` class. The way to work around this is to shade the protobuf

Spark Connect: API mismatch in SparkSesession#execute

2023-08-28 Thread Stefan Hagedorn
Hi everyone, Trying my luck here, after no success in the user mailing list :) I’m trying to use the "extension" feature of the Spark Connect CommandPlugin (Spark 3.4.1) [1]. I created a simple protobuf message `MyMessage` that I want to send from the connect client-side to the connect server