Yes, I agree that there is value for administrators from having some things
exposed as Spark SQL configuration. That gets much harder when you want to
use the SQLConf for table-level settings, though. For example, the target
split size is something that was an engine setting in the Hadoop world,
even though it makes no sense to use the same setting across vastly
different tables --- think about joining a fact table with a dimension
table.

Settings like write mode are table-level settings. It matters what is
downstream of the table. You may want to set a *default* write mode, but
the table-level setting should always win. Currently, there are limits to
overriding the write mode in SQL. That's why we should add hints. For
anything beyond that, I think we need to discuss what you're trying to do.
If it's to override a table-level setting with a SQL global, then we should
understand the use case better.

On Fri, Jul 14, 2023 at 6:09 PM Wing Yew Poon <wyp...@cloudera.com.invalid>
wrote:

> Also, in the case of write mode (I mean write.delete.mode,
> write.update.mode, write.merge.mode), these cannot be set as options
> currently; they are only settable as table properties.
>
> On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon <wyp...@cloudera.com> wrote:
>
>> I think that different use cases benefit from or even require different
>> solutions. I think enabling options in Spark SQL is helpful, but allowing
>> some configurations to be done in SQLConf is also helpful.
>> For Cheng Pan's use case (to disable locality), I think providing a conf
>> (which can be added to spark-defaults.conf by a cluster admin) is useful.
>> For my customer's use case (https://github.com/apache/iceberg/pull/7790),
>> being able to set the write mode per Spark job (where right now it can only
>> be set as a table property) is useful. Allowing this to be done in the SQL
>> with an option/hint could also work, but as I understand it, Szehon's PR (
>> https://github.com/apache/spark/pull/416830) is only applicable to
>> reads, not writes.
>>
>> - Wing Yew
>>
>>
>> On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote:
>>
>>> Ryan, I understand that option should be job-specific, and introducing
>>> an OPTIONS HINT can make Spark SQL achieves similar capabilities as
>>> DataFrame API does.
>>>
>>> My point is, some of the Iceberg options should not be job-specific.
>>>
>>> For example, Iceberg has an option “locality” which only allows setting
>>> at the job level, but Spark has a configuration
>>> “spark.shuffle.reduceLocality.enabled” which allows setting at the cluster
>>> level, this is a gap block Spark administers migrate to Iceberg because
>>> they can not disable it at the cluster level.
>>>
>>> So, what’s the principle in the Iceberg of classifying a configuration
>>> into SQLConf or OPTION?
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>>
>>> > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote:
>>> >
>>> > I would argue that the SQLConf way is more in line with Spark
>>> user/administrator habits.
>>> >
>>> > It’s a common practice that Spark administrators set configurations in
>>> spark-defaults.conf at the cluster level , and when the user has issues
>>> with their Spark SQL/Jobs, the first question they asked mostly is: can it
>>> be fixed by adding a spark configuration?
>>> >
>>> > The OPTIONS way brings additional learning efforts to Spark users and
>>> how can Spark administrators set them at cluster level?
>>> >
>>> > Thanks,
>>> > Cheng Pan
>>> >
>>> >
>>> >
>>> >
>>> >> On Jun 17, 2023, at 04:01, Wing Yew Poon <wyp...@cloudera.com.INVALID>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >> I recently put up a PR, https://github.com/apache/iceberg/pull/7790,
>>> to allow the write mode (copy-on-write/merge-on-read) to be specified in
>>> SQLConf. The use case is explained in the PR.
>>> >> Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733,
>>> to allow locality to be specified in SQLConf.
>>> >> In the recent past, https://github.com/apache/iceberg/pull/6838/ was
>>> a PR to allow the write distribution mode to be specified in SQLConf. This
>>> was merged.
>>> >> Cheng Pan asks if there is any guidance on when we should allow
>>> configs to be specified in SQLConf.
>>> >> Thanks,
>>> >> Wing Yew
>>> >>
>>> >> ps. The above open PRs could use reviews by committers.
>>> >>
>>> >
>>>
>>>

-- 
Ryan Blue
Tabular

Reply via email to