Re: [PR] functions: Add dict support for get field [datafusion]

2026-05-04 Thread via GitHub


alamb merged PR #21115:
URL: https://github.com/apache/datafusion/pull/21115


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-05-04 Thread via GitHub


alamb commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4374218180

   Thank you @brancz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-05-04 Thread via GitHub


brancz commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4371471250

   Done! Also applied @Jefffrey's last comment. I rebased, so all but the last 
commits are untouched, just the last one is new.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-05-03 Thread via GitHub


alamb commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4366105199

   @brancz  can you please merge this PR up from main (it needs to get the CI 
run again) so we can merge it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-04-16 Thread via GitHub


Jefffrey commented on code in PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#discussion_r3092876020


##
datafusion/functions/src/core/getfield.rs:
##
@@ -198,6 +202,52 @@ fn extract_single_field(base: ColumnarValue, name: 
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+.downcast_ref::>()
+.ok_or_else(|| {
+internal_datafusion_err!(
+"Failed to downcast dictionary with key type 
{key_type}"
+)
+})?;
+let values_struct = as_struct_array(dict.values())?;
+let field_col =
+
values_struct.column_by_name(&field_name).ok_or_else(|| {
+exec_datafusion_err!(
+"Field {field_name} not found in dictionary 
struct"
+)
+})?;
+// Rebuild dictionary: same keys, extracted field as 
values.
+let new_dict = DictionaryArray::<$key_ty>::try_new(
+dict.keys().clone(),
+Arc::clone(field_col),
+)?;
+Ok(ColumnarValue::Array(Arc::new(new_dict)))
+}};
+}
+
+match key_type.as_ref() {
+DataType::Int8 => extract_dict_field!(Int8Type),
+DataType::Int16 => extract_dict_field!(Int16Type),
+DataType::Int32 => extract_dict_field!(Int32Type),
+DataType::Int64 => extract_dict_field!(Int64Type),
+DataType::UInt8 => extract_dict_field!(UInt8Type),
+DataType::UInt16 => extract_dict_field!(UInt16Type),
+DataType::UInt32 => extract_dict_field!(UInt32Type),
+DataType::UInt64 => extract_dict_field!(UInt64Type),
+other => exec_err!("Unsupported dictionary key type: {other}"),
+}
+}

Review Comment:
   ```suggestion
   let dict = array.as_any_dictionary();
   let values_struct = dict.values().as_struct();
   let field_col =
   values_struct.column_by_name(&field_name).ok_or_else(|| {
   exec_datafusion_err!(
   "Field {field_name} not found in dictionary struct"
   )
   })?;
   Ok(ColumnarValue::Array(
   dict.with_values(Arc::clone(field_col)),
   ))
   ```
   
   Utilizing 
[`AnyDictionaryArray`](https://docs.rs/arrow/latest/arrow/array/trait.AnyDictionaryArray.html)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-04-11 Thread via GitHub


brancz commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4229265876

   Done!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-04-10 Thread via GitHub


alamb commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4227037779

   Looks like htere are some lint failures -- let's get them resolved and I'll 
merge this one in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-04-10 Thread via GitHub


brancz commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4224215048

   Done @alamb. Apologies for the force push, had to resolve a conflict, but 
the change is entirely in the last commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-03-27 Thread via GitHub


brancz commented on PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#issuecomment-4141194672

   Addressed all your comments @martin-g! I made them extra commits so it's 
easy to review the change, happy to squash at the end if that's the preference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] functions: Add dict support for get field [datafusion]

2026-03-25 Thread via GitHub


martin-g commented on code in PR #21115:
URL: https://github.com/apache/datafusion/pull/21115#discussion_r298756


##
datafusion/functions/src/core/getfield.rs:
##
@@ -199,6 +199,53 @@ fn extract_single_field(base: ColumnarValue, name: 
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+
.downcast_ref::>()
+.ok_or_else(|| {
+
datafusion_common::DataFusionError::Internal(format!(
+"Failed to downcast dictionary with key type 
{}",
+key_type
+))
+})?;
+let values_struct = as_struct_array(dict.values())?;
+let field_col =
+
values_struct.column_by_name(&field_name).ok_or_else(|| {
+
datafusion_common::DataFusionError::Execution(format!(
+"Field {field_name} not found in dictionary 
struct"
+))

Review Comment:
   ```suggestion
   exec_err!("Field {field_name} not found in 
dictionary struct")
   ```



##
datafusion/functions/src/core/getfield.rs:
##
@@ -199,6 +199,53 @@ fn extract_single_field(base: ColumnarValue, name: 
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+
.downcast_ref::>()
+.ok_or_else(|| {
+
datafusion_common::DataFusionError::Internal(format!(
+"Failed to downcast dictionary with key type 
{}",
+key_type
+))
+})?;
+let values_struct = as_struct_array(dict.values())?;
+let field_col =
+
values_struct.column_by_name(&field_name).ok_or_else(|| {
+
datafusion_common::DataFusionError::Execution(format!(
+"Field {field_name} not found in dictionary 
struct"
+))
+})?;
+// Rebuild dictionary: same keys, extracted field as 
values.
+let new_dict = 
arrow::array::DictionaryArray::<$key_ty>::try_new(
+dict.keys().clone(),
+Arc::clone(field_col),
+)?;
+Ok(ColumnarValue::Array(Arc::new(new_dict)))
+}};
+}
+
+match key_type.as_ref() {
+DataType::Int8 => 
extract_dict_field!(arrow::datatypes::Int8Type),

Review Comment:
   ```suggestion
   DataType::Int8 => extract_dict_field!(Int8Type),
   ```
   and import them with `use arrow::datatypes::...`



##
datafusion/functions/src/core/getfield.rs:
##
@@ -199,6 +199,53 @@ fn extract_single_field(base: ColumnarValue, name: 
ScalarValue) -> Result
+{
+// Downcast to DictionaryArray to access keys and values without
+// materializing the dictionary.
+macro_rules! extract_dict_field {
+($key_ty:ty) => {{
+let dict = array
+.as_any()
+
.downcast_ref::>()

Review Comment:
   nit: `use arrow::array::DictionaryArray` and
   ```suggestion
   .downcast_ref::>()
   ```



##
datafusion/functions/src/core/getfield.rs:
##
@@ -338,6 +385,42 @@ impl ScalarUDFImpl for GetFieldFunc {
 }
 }
 }
+// Dictionary-encoded struct: resolve the child field from
+// the underlying struct, then wrap the result back in the
+// same Dictionary type so the promised type matches execution.
+DataType::Dictionary(key_type, value_type)
+if matches!(value_type.as_ref(), DataType::Struct(_)) =>
+{
+let DataType::Struct(fields) = value_type.as_ref() else {
+unreachable!()
+};
+let field_name = sv
+.as_ref()
+.and_then(|sv| {
+sv.try_as_str().flatten().filter(|s| !s.is_empty())
+})
+