Re: [PR] Support metadata on scalar values [datafusion]
timsaucer commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2105076604
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
I have a slightly deeper cut in this PR:
https://github.com/apache/datafusion/pull/16170
In addition to the work in the current PR, that above one adds the metadata
directly to the `Literal` variant of the logical plan rather than only the
physical plan. I think that one is a better version long term, but it is more
of a breaking change. If we're doing these other breaking changes for metadata
in user defined functions in DF48, then maybe we can add it in as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
paleolimbot commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2103559755
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
Keeping with the literial is OK as long as we are also prepared to make any
other breaking changes required to pipe metadata through to ensure that it is
not dropped/it is possible to implement extension logic in other places without
rewriting large chunks of DataFusion code. A few places I've noted in passing
are the Statistics, the physical and logical literals, casting logic, and some
conversions like protobuf and to/from Python via ffi. Perhaps those places
should be using a `Literal` instead of a `ScalarValue`?
This is possibly a good illustration of the difference between "metadata"
and extension types...an extension type scalar should not be blindly/implicitly
cast to its storage type; however, for an integer that happened to have some
field metadata attached to it, it would be surprising if that metadata caused
an error somewhere in the engine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
timsaucer commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2102451109
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
I think there are two compelling arguments for keeping the metadata on the
`Literal` rather than the `ScalarValue`
- Does not require a change to the protobuf definitions. I haven't tested
this thoroughly yet, but I think if we put the metadata on the scalar value we
would need more breaking changes on the conversion between protobuf and
internal types, basically needing to pass the schema around. This isn't a
deal-breaker but I could see pain points for some users.
- The match statements like @tobixdev points out. I've also seen a growing
usage of things like `let ScalarValue::Utf8(my_string) = val else { return
exec_error!("invalid scalar value") }` which would not survive a case where
they really do have a valid value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
tobixdev commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2098428747
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
Thank you for working on that! That's an interesting idea! When i
experimented with logical scalars, I did not go down that path as I thought
that it is very easy to miss handling metadata that only adds little
information to a scalar.
To elaborate a bit on that: Let's assume I have a `DataType::Utf8` field and
a `len()` function that returns the length of a value. For strings, that could
be the number of unicode characters.
Now, given the following "implementation" of `len`:
```rust
match value {
ScalarValue::Utf8(value) => count_chars(value),
_ => panic!("bye bye")
}
```
Now I'd like to call this function with
`ScalarValue::WithMetadata(my_string, LANGUAGE_INFO)`. This call will panic, as
the actual scalar is hidden behind the metadata wrapper. Initially I though
this is a deal-breaker, as many functions just want to work on the physical
data. After all, that's how it is with arrays, as the Metadata can be simply
ignored.
However, in a scenario where the metadata indicates an extension type, it
may be a good thing that Metadata cannot be ignored that easily. For example,
the `len()` function should probably return the number of keys/entries in the
top-level JSON object/array. Carrying the metadata in a separate `ScalarValue`
variant then may be a good thing! Especially if there is a function that
unwraps the metadata so that some functions can operate on the scalar without
metadata.
Hope this does make some sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
tobixdev commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2098428747
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
Thanks! That's an interesting idea! When i experimented with logical
scalars, I did not go down that path as I thought that it is very easy to miss
handling metadata that only adds little information to a scalar.
To elaborate a bit on that: Let's assume I have a `DataType::Utf8` field and
a `len()` function that returns the length of a value. For strings, that could
be the number of unicode characters.
Now, given the following "implementation" of `len`:
```rust
match value {
ScalarValue::Utf8(value) => count_chars(value),
_ => panic!("bye bye")
}
```
Now I'd like to call this function with
`ScalarValue::WithMetadata(my_string, LANGUAGE_INFO)`. This call will panic, as
the actual scalar is hidden behind the metadata wrapper. Initially I though
this is a deal-breaker, as many functions just want to work on the physical
data. After all, that's how it is with arrays, as the Metadata can be simply
ignored.
However, in a scenario where the metadata indicates an extension type, it
may be a good thing that Metadata cannot be ignored that easily. For example,
the `len()` function should probably return the number of keys/entries in the
top-level JSON object/array. Carrying the metadata in a separate `ScalarValue`
variant then may be a good thing! Especially if there is a function that
unwraps the metadata so that some functions can operate on the scalar without
metadata.
Hope this does make some sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
timsaucer commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2094500696
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
Oh, I hadn't thought about that. That's a very interesting way to solve one
of the problems I was coming up against. I will try a proof of concept of that
on a separate branch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
paleolimbot commented on code in PR #16053:
URL: https://github.com/apache/datafusion/pull/16053#discussion_r2094213318
##
datafusion/physical-expr/src/expressions/literal.rs:
##
@@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval;
use datafusion_expr_common::sort_properties::{ExprProperties, SortProperties};
/// Represents a literal value
-#[derive(Debug, PartialEq, Eq, Hash)]
+#[derive(Debug, PartialEq, Eq)]
pub struct Literal {
value: ScalarValue,
+metadata: Option>,
Review Comment:
Interesting - this puts the responsibility for carrying metadata on the
`Literal` instead of the `ScalarValue`. I wonder if there are other places
(e.g., the `Statistics`) where we'll need a scalar paired with metadata (I was
wondering if `ScalarValue::WithMetadata(Box, HashMap>)` might work as well).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Support metadata on scalar values [datafusion]
timsaucer commented on PR #16053: URL: https://github.com/apache/datafusion/pull/16053#issuecomment-2888458903 @kylebarron @paleolimbot I tested this latest push against the `test_st_point` including the additional parts that were commented out. Do you have other examples of problems you've run into? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
