I think for the current requirement, substrait is something, which I'd like to give it a try.
Thanks, Surya On Thu, Aug 29, 2024 at 11:59 AM Kevin Liu <[email protected]> wrote: > If you're using open table formats, Delta Lake has the "generated column" > feature which supports specifying a formula using other table columns. > > https://docs.databricks.com/en/delta/generated-columns.html > https://delta.io/blog/2023-04-12-delta-lake-generated-columns/ > > Cheers, > Kevin > > On Thu, Aug 29, 2024 at 2:04 PM Jacek Pliszka <[email protected]> > wrote: > >> Hi! >> >> Another option would be converting to an arrow-backed pandas table and >> using a dataframe query method. Other libraries like DuckDB most >> likely offer similar options. >> >> BR >> >> J >> >> czw., 29 sie 2024 o 02:54 Felipe Oliveira Carvalho >> <[email protected]> napisał(a): >> > >> > You can build `compure::Expression` instances [1] and use them in >> different contexts like scanning datasets [2] and producing Substrait plans >> [3] that you can execute. >> > >> > But you have to write your own parser and define the scope and >> semantics of the operations you would support. >> > >> > [1] >> https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/expression.h#L45 >> > [2] >> https://github.com/apache/arrow/blob/main/cpp/examples/arrow/dataset_documentation_example.cc#L266 >> > [3] >> https://github.com/apache/arrow/blob/main/cpp/src/arrow/engine/substrait/relation.h#L55 >> > >> > -- >> > Felipe >> > >> > On Wed, Aug 28, 2024 at 1:11 AM Surya Kiran Gullapalli < >> [email protected]> wrote: >> >> >> >> Hello all, >> >> Let's say I've a table containing 3 columns 'A', 'B', and 'C'. Is it >> possible to create a 4th column 'D' using a formula (like (A+B)/C) ? >> >> >> >> I know I can manually create them using compute functions, but is it >> possible to parse a formula like the above and compute the column on the >> fly at runtime ? >> >> >> >> Any pointers are greatly appreciated. >> >> >> >> Thanks, >> >> Surya >> >
