Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Robert Stupp
UDFs are as engine specific and portable and "non-centralized" as views are. The same performance concerns apply to views as well. Iceberg should define a common base upon which engines can build, so the argument that UDFs aren't practical, because engines are different, is probably only a tem

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Jack Ye
> While it would be great to have a common set of functions across engines, I don't see how that is practical when those engines are implemented so differently. Plugging in code -- and especially custom user-supplied code -- seems inherently specialized to me and should be part of the engines' desi

Re: Addressing security questions in the Iceberg REST specification

2024-05-28 Thread Jack Ye
Seems like this thread did not get much attention? > * (Naive) Iceberg REST servers may proxy requests received for > '/v1/oauth/tokens’ - and effectively become a “man-in-the-middle”, which is not fully compliant with the OAuth 2.0 specification. This seems like a concern to me, could any REST

Re: Addressing security questions in the Iceberg REST specification

2024-05-28 Thread Fokko Driesprong
Hey Robert, Sorry for the late reply as I was out last week. I'm not an OAuth guru either, but some context from my end. * Credentials (for example username/password) must _never_ be sent to > the resource server, only to the authorization server. In an earlier discussion

Re: GitHub issue labels

2024-05-28 Thread Jean-Baptiste Onofré
Hi Manu, You are right: I extracted logic from Apache Kiddle which is a suite of tools for collecting, aggregating and visualizing activity in projects. I'm starting to use it while reviewing project report during board meeting preparation. I'm busy with revapi/gradle update right now, but I can

Re: Addressing security questions in the Iceberg REST specification

2024-05-28 Thread Jack Ye
Sounds like we should try to finalize a consensus around https://github.com/apache/iceberg/pull/9940, so that we make it very clear what APIs/features are optional. -Jack On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong wrote: > Hey Robert, > > Sorry for the late reply as I was out last week. I

Re: GitHub issue labels

2024-05-28 Thread Yufei Gu
It’s a good idea to send a weekly report. It increases visibility, engages the community, and helps track progress. Key considerations include keeping the report concise, and automating the process. We could categorize issues/PRs with labels. For example, putting the ones triaged and un-triaged i

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Ajantha Bhat
> > I guess we'll know more when you post the proposal, but I think this would > be a very difficult area to tackle across engines, languages, and memory > models without having a huge performance penalty. Assuming Iceberg initially supports SQL representations of UDFs (similar to views as shared

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Walaa Eldin Moustafa
I think there is a disconnect about what is perceived as a "UDF". There are 2 flavors: (1) Functions that are defined by the user whose definition is a composition of other built-in functions/SQL expressions. (2) Custom code written in imperative function according to a Java/Scala/Python API, etc.

Re: Addressing security questions in the Iceberg REST specification

2024-05-28 Thread Yufei Gu
Not an expert on authentication, but reading from the context, I agree that it’s not a good practice to use a resource server as a token server. The resource server would need to securely handle and store credentials or tokens, increasing the risk of credential theft or leakage. Making the token en

Re: Addressing security questions in the Iceberg REST specification

2024-05-28 Thread Amogh Jahagirdar
I disagree with removing "/v1/oauth/tokens" and I think I also disagree with the premise that implementing that endpoint is required, but I can understand how that's not clear in the spec. I think we can address the required vs non-required discussion with the capabilities PR.

Re: Addressing security questions in the Iceberg REST specification

2024-05-28 Thread Alex Dutra
Hi, > On point 4, isn't that possible today, Can't that be achieved with the > current token exchange approach, and the internal implementation of the > endpoint? Unfortunately, no. Token exchange is not widely adopted yet: for example, Keycloak has only partial support for it, and Authelia, or

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Benny Chow
It's interesting to note that a tabular SQL UDF can be used to build a *parameterized *view. So, there's definitely a lot in common between UDFs and views. Thanks On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa wrote: > I think there is a disconnect about what is perceived as a "UDF". The

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Jack Ye
> (2) Custom code written in imperative function according to a Java/Scala/Python API, etc. I think we could still explore some long term opportunities in this case. Consider you register a Spark temp view as some sort of data frame read, then it could still be resolved to a Spark plan that is rep

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Walaa Eldin Moustafa
Thanks Jack. I actually meant scalar/aggregate/table user defined functions. Here are some examples of what I meant in (2): Hive GenericUDF: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java Trino user defined functions: https://trino.io/d

Proposal to support cherrypick static overwrite

2024-05-28 Thread Pucheng Yang
Hi community, My client is looking for the support of cherrypick static partition overwrite. Based on my understanding, the reason we can not do it is because we do not preserve static overwrite filters. I would like to make a proposal to support cherrypick static overwrite: 1. We will allow user