Re: [PR] functions: Add dict support for get field [datafusion]
alamb merged PR #21115: URL: https://github.com/apache/datafusion/pull/21115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
alamb commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4374218180 Thank you @brancz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
brancz commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4371471250 Done! Also applied @Jefffrey's last comment. I rebased, so all but the last commits are untouched, just the last one is new. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
alamb commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4366105199 @brancz can you please merge this PR up from main (it needs to get the CI run again) so we can merge it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
Jefffrey commented on code in PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#discussion_r3092876020
##
datafusion/functions/src/core/getfield.rs:
##
@@ -198,6 +202,52 @@ fn extract_single_field(base: ColumnarValue, name:
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+.downcast_ref::>()
+.ok_or_else(|| {
+internal_datafusion_err!(
+"Failed to downcast dictionary with key type
{key_type}"
+)
+})?;
+let values_struct = as_struct_array(dict.values())?;
+let field_col =
+
values_struct.column_by_name(&field_name).ok_or_else(|| {
+exec_datafusion_err!(
+"Field {field_name} not found in dictionary
struct"
+)
+})?;
+// Rebuild dictionary: same keys, extracted field as
values.
+let new_dict = DictionaryArray::<$key_ty>::try_new(
+dict.keys().clone(),
+Arc::clone(field_col),
+)?;
+Ok(ColumnarValue::Array(Arc::new(new_dict)))
+}};
+}
+
+match key_type.as_ref() {
+DataType::Int8 => extract_dict_field!(Int8Type),
+DataType::Int16 => extract_dict_field!(Int16Type),
+DataType::Int32 => extract_dict_field!(Int32Type),
+DataType::Int64 => extract_dict_field!(Int64Type),
+DataType::UInt8 => extract_dict_field!(UInt8Type),
+DataType::UInt16 => extract_dict_field!(UInt16Type),
+DataType::UInt32 => extract_dict_field!(UInt32Type),
+DataType::UInt64 => extract_dict_field!(UInt64Type),
+other => exec_err!("Unsupported dictionary key type: {other}"),
+}
+}
Review Comment:
```suggestion
let dict = array.as_any_dictionary();
let values_struct = dict.values().as_struct();
let field_col =
values_struct.column_by_name(&field_name).ok_or_else(|| {
exec_datafusion_err!(
"Field {field_name} not found in dictionary struct"
)
})?;
Ok(ColumnarValue::Array(
dict.with_values(Arc::clone(field_col)),
))
```
Utilizing
[`AnyDictionaryArray`](https://docs.rs/arrow/latest/arrow/array/trait.AnyDictionaryArray.html)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
brancz commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4229265876 Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
alamb commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4227037779 Looks like htere are some lint failures -- let's get them resolved and I'll merge this one in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
brancz commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4224215048 Done @alamb. Apologies for the force push, had to resolve a conflict, but the change is entirely in the last commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
brancz commented on PR #21115: URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4141194672 Addressed all your comments @martin-g! I made them extra commits so it's easy to review the change, happy to squash at the end if that's the preference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] functions: Add dict support for get field [datafusion]
martin-g commented on code in PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#discussion_r298756
##
datafusion/functions/src/core/getfield.rs:
##
@@ -199,6 +199,53 @@ fn extract_single_field(base: ColumnarValue, name:
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+
.downcast_ref::>()
+.ok_or_else(|| {
+
datafusion_common::DataFusionError::Internal(format!(
+"Failed to downcast dictionary with key type
{}",
+key_type
+))
+})?;
+let values_struct = as_struct_array(dict.values())?;
+let field_col =
+
values_struct.column_by_name(&field_name).ok_or_else(|| {
+
datafusion_common::DataFusionError::Execution(format!(
+"Field {field_name} not found in dictionary
struct"
+))
Review Comment:
```suggestion
exec_err!("Field {field_name} not found in
dictionary struct")
```
##
datafusion/functions/src/core/getfield.rs:
##
@@ -199,6 +199,53 @@ fn extract_single_field(base: ColumnarValue, name:
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+
.downcast_ref::>()
+.ok_or_else(|| {
+
datafusion_common::DataFusionError::Internal(format!(
+"Failed to downcast dictionary with key type
{}",
+key_type
+))
+})?;
+let values_struct = as_struct_array(dict.values())?;
+let field_col =
+
values_struct.column_by_name(&field_name).ok_or_else(|| {
+
datafusion_common::DataFusionError::Execution(format!(
+"Field {field_name} not found in dictionary
struct"
+))
+})?;
+// Rebuild dictionary: same keys, extracted field as
values.
+let new_dict =
arrow::array::DictionaryArray::<$key_ty>::try_new(
+dict.keys().clone(),
+Arc::clone(field_col),
+)?;
+Ok(ColumnarValue::Array(Arc::new(new_dict)))
+}};
+}
+
+match key_type.as_ref() {
+DataType::Int8 =>
extract_dict_field!(arrow::datatypes::Int8Type),
Review Comment:
```suggestion
DataType::Int8 => extract_dict_field!(Int8Type),
```
and import them with `use arrow::datatypes::...`
##
datafusion/functions/src/core/getfield.rs:
##
@@ -199,6 +199,53 @@ fn extract_single_field(base: ColumnarValue, name:
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+
.downcast_ref::>()
Review Comment:
nit: `use arrow::array::DictionaryArray` and
```suggestion
.downcast_ref::>()
```
##
datafusion/functions/src/core/getfield.rs:
##
@@ -338,6 +385,42 @@ impl ScalarUDFImpl for GetFieldFunc {
}
}
}
+// Dictionary-encoded struct: resolve the child field from
+// the underlying struct, then wrap the result back in the
+// same Dictionary type so the promised type matches execution.
+DataType::Dictionary(key_type, value_type)
+if matches!(value_type.as_ref(), DataType::Struct(_)) =>
+{
+let DataType::Struct(fields) = value_type.as_ref() else {
+unreachable!()
+};
+let field_name = sv
+.as_ref()
+.and_then(|sv| {
+sv.try_as_str().flatten().filter(|s| !s.is_empty())
+})
+
