Re: allowing configs to be specified in SQLConf for Spark reads/writes

2024-07-09 Thread Szehon Ho
Sure, the prs are https://github.com/apache/spark/pull/44119 (merge), https://github.com/apache/spark/pull/47233 (update), and delete in progress. Thanks Szehon On Tue, Jul 9, 2024 at 10:27 PM Wing Yew Poon wrote: > Hi Szehon, > Thanks for the update. > Can you please point me to the work on su

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2024-07-09 Thread Wing Yew Poon
Hi Szehon, Thanks for the update. Can you please point me to the work on supporting DELETE/UPDATE/MERGE in the DataFrame API? Thanks, Wing Yew On Tue, Jul 9, 2024 at 10:05 PM Szehon Ho wrote: > Hi, > > Just FYI, good news, this change is merged on the Spark side : > https://github.com/apache/sp

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2024-07-09 Thread Szehon Ho
Hi, Just FYI, good news, this change is merged on the Spark side : https://github.com/apache/spark/pull/46707 (its the third effort!). In next version of Spark, we will be able to pass read properties via SQL to a particular Iceberg table such as SELECT * FROM iceberg.db.table1 WITH (`locality`

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-26 Thread Wing Yew Poon
We are talking about DELETE/UPDATE/MERGE operations. There is only SQL support for these operations. There is no DataFrame API support for them.* Therefore write options are not applicable. Thus SQLConf is the only available mechanism I can use to override the table property. For reference, we curr

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-26 Thread Ryan Blue
I think we should aim to have the same behavior across properties that are set in SQL conf, table config, and write options. Having SQL conf override table config for this doesn't make sense to me. If the need is to override table configuration, then write options are the right way to do it. On We

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-26 Thread Wing Yew Poon
I was on vacation. Currently, write modes (copy-on-write/merge-on-read) can only be set as table properties, and default to copy-on-write. We have a customer who wants to use copy-on-write for certain Spark jobs that write to some Iceberg table and merge-on-read for other Spark jobs writing to the

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-16 Thread Ryan Blue
Yes, I agree that there is value for administrators from having some things exposed as Spark SQL configuration. That gets much harder when you want to use the SQLConf for table-level settings, though. For example, the target split size is something that was an engine setting in the Hadoop world, ev

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-14 Thread Wing Yew Poon
Also, in the case of write mode (I mean write.delete.mode, write.update.mode, write.merge.mode), these cannot be set as options currently; they are only settable as table properties. On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon wrote: > I think that different use cases benefit from or even requ

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-14 Thread Wing Yew Poon
I think that different use cases benefit from or even require different solutions. I think enabling options in Spark SQL is helpful, but allowing some configurations to be done in SQLConf is also helpful. For Cheng Pan's use case (to disable locality), I think providing a conf (which can be added t

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-13 Thread Cheng Pan
Ryan, I understand that option should be job-specific, and introducing an OPTIONS HINT can make Spark SQL achieves similar capabilities as DataFrame API does. My point is, some of the Iceberg options should not be job-specific. For example, Iceberg has an option “locality” which only allows set

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-06 Thread Liwei Li
Also +1 for OPTIONS hints. It is useful to allow some options to be specified in SQLConf. On Thu, Jul 6, 2023 at 1:05 AM Ryan Blue wrote: > Cheng, that's true of certain options that are targeted at administrators. > But the DataFrameReader or DataFrameWriter options are job-specific, which > i

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-05 Thread Ryan Blue
Cheng, that's true of certain options that are targeted at administrators. But the DataFrameReader or DataFrameWriter options are job-specific, which is why a hint makes the most sense. On Wed, Jul 5, 2023 at 1:26 AM Cheng Pan wrote: > I would argue that the SQLConf way is more in line with Spar

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-05 Thread Cheng Pan
I would argue that the SQLConf way is more in line with Spark user/administrator habits. It’s a common practice that Spark administrators set configurations in spark-defaults.conf at the cluster level , and when the user has issues with their Spark SQL/Jobs, the first question they asked mostly

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-26 Thread Ryan Blue
+1 for adding OPTIONS hints. If we can do that in SQL extensions then that makes sense to me for the existing Spark versions that don't support it. On Mon, Jun 26, 2023 at 11:18 AM Szehon Ho wrote: > Hi, > > Yea that sounds good to me. > > Btw, that being said, I'm not opposed to configuring som

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-26 Thread Szehon Ho
Hi, Yea that sounds good to me. Btw, that being said, I'm not opposed to configuring some of options in the thread, especially write options, as sql conf either. (Not sure this mechanism can support write conf without some changes to parser). And in any case, it could be cascading: sql_dynamic_

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-24 Thread Manu Zhang
If the Spark community doesn’t accept this solution, how about adding it as an extension in Iceberg? I’m also wondering what people here think about it. Thanks for reviving the effort. Manu Szehon Ho 于2023年6月22日 周四00:45写道: > Hi, > > Yea, its definitely an issue. > > Fwiw, I was looking at revivi

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-21 Thread Szehon Ho
Hi, Yea, its definitely an issue. Fwiw, I was looking at reviving the old effort in Spark to pass in configs dynamically in Spark SQL statement, which is probably the cleanest solution. (https://github.com/apache/spark/pull/34072 was the old effort, and I made https://github.com/apache/spark/pul

allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-16 Thread Wing Yew Poon
Hi, I recently put up a PR, https://github.com/apache/iceberg/pull/7790, to allow the write mode (copy-on-write/merge-on-read) to be specified in SQLConf. The use case is explained in the PR. Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733, to allow locality to be specified in