Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
github-actions[bot] closed pull request #16136: Specialized `GroupValues` for `primitive` and `large_primitive` URL: https://github.com/apache/datafusion/pull/16136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
github-actions[bot] commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-3130331822 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917647569 π€: Benchmark completed Details ``` Comparing HEAD and improve-primitive-group-values Benchmark clickbench_extended.json ββββ³β³β³βββ β Queryβ HEAD β improve-primitive-group-values β Change β β‘ββββββββ© β QQuery 0 β 1915.07ms β 1925.29ms βno change β β QQuery 1 β 696.08ms β 702.78ms βno change β β QQuery 2 β 1421.79ms β 1407.15ms βno change β β QQuery 3 β 693.44ms β 689.96ms βno change β β QQuery 4 β 1454.69ms β 1447.10ms βno change β β QQuery 5 β 15331.97ms β 15156.51ms βno change β β QQuery 6 β 2106.64ms β 2044.70ms βno change β β QQuery 7 β 2037.56ms β 2207.73ms β 1.08x slower β β QQuery 8 β 818.29ms β 826.34ms βno change β ββββ΄β΄β΄βββ βββββ³β β Benchmark Summary ββ β‘βββββ© β Total Time (HEAD) β 26475.53ms β β Total Time (improve-primitive-group-values) β 26407.55ms β β Average Time (HEAD) β 2941.73ms β β Average Time (improve-primitive-group-values) β 2934.17ms β β Queries Fasterβ 0 β β Queries Slowerβ 1 β β Queries with No Changeβ 8 β βββββ΄β Benchmark clickbench_partitioned.json ββββ³β³β³βββ β Queryβ HEAD β improve-primitive-group-values β Change β β‘ββββββββ© β QQuery 0 β15.44ms β15.45ms βno change β β QQuery 1 β32.31ms β32.50ms βno change β β QQuery 2 β79.82ms β81.50ms βno change β β QQuery 3 β96.11ms β97.19ms βno change β β QQuery 4 β 576.73ms β 582.80ms βno change β β QQuery 5 β 874.26ms β 897.85ms βno change β β QQuery 6 β22.21ms β23.67ms β 1.07x slower β β QQuery 7 β36.56ms β38.49ms β 1.05x slower β β QQuery 8 β 876.79ms β 901.41ms βno change β β QQuery 9 β 1180.79ms β 1184.28ms βno change β β QQuery 10β 260.73ms β 262.17ms βno change β β QQuery 11β 293.15ms β 294.16ms βno change β β QQuery 12β 905.30ms β 887.00ms βno change β β QQuery 13β 1375.72ms β 1387.10ms βno change β β QQuery 14β 855.54ms β 838.72ms βno change β β QQuery 15β 819.43ms β 824.34ms βno change β β QQuery 16β 1717.88ms β 1721.36ms βno change β β QQuery 17β 1576.03ms β 1618.82ms βno change β β QQuery 18β 3021.58ms β 3057.47ms βno change β β QQuery 19β81.36ms β85.15ms βno change β β QQuery 20β 1149.28ms β 1140.23ms βno change β β QQuery 21β 1335.99ms β 1327.81ms βno change β β QQuery 22β 2276.34ms β 2214.15ms βno change β β QQuery 23β 8187.11ms β 8094.83ms βno change β β QQuery 24β 469.30ms β 467.50ms βno change β β QQuery 25β 401.62ms β 387.40ms βno change β β QQuery 26β 542.07ms β 536.12ms βno change β β QQuery 27β 1624.08ms β 1595.77ms βno change β β QQuery 28β 12366.86ms β 12744.64ms βno change β β QQuery 29β 537.01ms β 524.08ms βno change β β QQuery 30β 814.22ms β 809.74ms βno change β β QQuery 31β 848.16ms β 845.50ms β
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917562287 π€ `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing improve-primitive-group-values (8e09c834544c692b5b446dcc017b01dd9d4a1c2d) to 17fe504b0b8ca4dfaaf81482557d3ff2e86df52f [diff](https://github.com/apache/datafusion/compare/17fe504b0b8ca4dfaaf81482557d3ff2e86df52f..8e09c834544c692b5b446dcc017b01dd9d4a1c2d) Benchmarks: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917542628 > I think this PR makes things better so approving. Nice work @Rachelint. Thanks @alamb , I think still two blocked things before merging it: - Maybe we should also compare the performance of this one with what @Dandandan mentioned https://github.com/apache/datafusion/pull/16136#discussion_r2100764581 - Run the benchmark again, and see if it always reproducable (I still have concern about no improvement in q8 of clickbench_extended, it is not as expected...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2112691325
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -116,42 +122,60 @@ where
{
fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec) ->
Result<()> {
assert_eq!(cols.len(), 1);
+let col = cols[0].as_primitive::();
+
groups.clear();
+self.append_row_indices.clear();
-for v in cols[0].as_primitive::() {
+let mut num_total_groups = self.values.len();
+for (row_index, v) in col.iter().enumerate() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = num_total_groups;
+self.append_row_indices.push(row_index as u32);
+num_total_groups += 1;
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
-|&(_, h)| h,
+|&(_, v)| v.is_eq(key),
+|&(_, v)| v.hash(state),
);
match insert {
hashbrown::hash_table::Entry::Occupied(o) => o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, hash));
-self.values.push(key);
+let g = num_total_groups;
+v.insert((g, key));
+self.append_row_indices.push(row_index as u32);
+num_total_groups += 1;
g
}
}
}
};
groups.push(group_id)
}
+
+// If all are new groups, we just extend it
+if self.append_row_indices.len() == col.len() {
+self.values.extend_from_slice(col.values());
+} else {
+let col_values = col.values();
+for &row_index in self.append_row_indices.iter() {
Review Comment:
Actually... I found no obvious improvement when switching to `extend`...
The bottleneck still the `hashtable`, I think it is better to just keep the
original `push` logic because it may be simpler, and actually efficient enough.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2104358004
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -116,42 +122,60 @@ where
{
fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec) ->
Result<()> {
assert_eq!(cols.len(), 1);
+let col = cols[0].as_primitive::();
+
groups.clear();
+self.append_row_indices.clear();
-for v in cols[0].as_primitive::() {
+let mut num_total_groups = self.values.len();
+for (row_index, v) in col.iter().enumerate() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = num_total_groups;
+self.append_row_indices.push(row_index as u32);
+num_total_groups += 1;
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
-|&(_, h)| h,
+|&(_, v)| v.is_eq(key),
+|&(_, v)| v.hash(state),
);
match insert {
hashbrown::hash_table::Entry::Occupied(o) => o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, hash));
-self.values.push(key);
+let g = num_total_groups;
+v.insert((g, key));
+self.append_row_indices.push(row_index as u32);
+num_total_groups += 1;
g
}
}
}
};
groups.push(group_id)
Review Comment:
this can `extend` to `groups`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2104358998
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -116,42 +122,60 @@ where
{
fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec) ->
Result<()> {
assert_eq!(cols.len(), 1);
+let col = cols[0].as_primitive::();
+
groups.clear();
+self.append_row_indices.clear();
-for v in cols[0].as_primitive::() {
+let mut num_total_groups = self.values.len();
+for (row_index, v) in col.iter().enumerate() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = num_total_groups;
+self.append_row_indices.push(row_index as u32);
+num_total_groups += 1;
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
-|&(_, h)| h,
+|&(_, v)| v.is_eq(key),
+|&(_, v)| v.hash(state),
);
match insert {
hashbrown::hash_table::Entry::Occupied(o) => o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, hash));
-self.values.push(key);
+let g = num_total_groups;
+v.insert((g, key));
+self.append_row_indices.push(row_index as u32);
+num_total_groups += 1;
g
}
}
}
};
groups.push(group_id)
}
+
+// If all are new groups, we just extend it
+if self.append_row_indices.len() == col.len() {
+self.values.extend_from_slice(col.values());
+} else {
+let col_values = col.values();
+for &row_index in self.append_row_indices.iter() {
Review Comment:
can be written as
`self.values.extend(self.append_row_indices.iter().map(...)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2104357071
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
+/// The data type of the output array
+data_type: DataType,
+/// Stores the `(group_index, hash)` based on the hash of its value
+///
+/// We also store `hash` is for reducing cost of rehashing. Such cost
+/// is obvious in high cardinality group by situation.
+/// More details can see:
+///
+///
+map: HashTable<(usize, u64)>,
+/// The group index of the null value if any
+null_group: Option,
+/// The values for each group index
+values: Vec,
+/// The random state used to generate hashes
+random_state: RandomState,
+}
+
+impl GroupValuesLargePrimitive {
+pub fn new(data_type: DataType) -> Self {
+assert!(PrimitiveArrayis_compatible(&data_type));
+Self {
+data_type,
+map: HashTable::with_capacity(128),
+values: Vec::with_capacity(128),
+null_group: None,
+random_state: Default::default(),
+}
+}
+}
+
+impl GroupValues for GroupValuesLargePrimitive
+where
+T::Native: HashValue,
+{
+fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec) ->
Result<()> {
+assert_eq!(cols.len(), 1);
+groups.clear();
+
+for v in cols[0].as_primitive::() {
+let group_id = match v {
+None => *self.null_group.get_or_insert_with(|| {
+let group_id = self.values.len();
+self.values.push(Default::default());
+group_id
+}),
+Some(key) => {
+let state = &self.random_state;
+let hash = key.hash(state);
+let insert = self.map.entry(
+hash,
+|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
+|&(_, h)| h,
+);
+
+match insert {
+hashbrown::hash_table::Entry::Occupied(o) => o.get().0,
+hashbrown::hash_table::Entry::Vacant(v) => {
+let g = self.values.len();
+v.insert((g, hash));
+self.values.push(key);
+g
+}
+}
+}
+};
+groups.push(group_id)
Review Comment:
It can `extend` to groups here by writing this as iterator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
alamb commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2103313269
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
Review Comment:
I think it would help me a lot if you could document the rationale behind
this change and why it is different than `GroupValuesPrimitive`
```suggestion
/// This specialization is faster than [`GroupValuesPrimitive`] because it
does not store values directly
/// in the hash table, but instead stores an index into self.vales
```
Or something like that
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101922845
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
> Hm I think the storage of a value inside `HashTable` will not be very
different from `Vec` besides being inline?
I think for the `value <= 64bits`, storing it inside `HashTable`:
- Can decrease the size of `HashTable` comparing to store their hashes?
- And when probing and rehashing of `HashTable` totally in-place, and
eliminting the random read to a the outside `Vec`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101922845
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
> Hm I think the storage of a value inside `HashTable` will not be very
different from `Vec` besides being inline?
I think for the `value <= 64bits`, storing it inside `HashTable`:
- Can decrease the size of `HashTable` comparing to store their hashes?
- And when probing and rehashing of `HashTable` totally in-place, and
eliminting the random read to a the outside `Vec`?
But I am indeed not sure, for the large primitive, if doing this can get
benefit, because storing them inline `HashTable` will make the table larger
than storing their hashes...
Also comparing them directly is more expansive than comparing their hashes...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101922845
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
> Hm I think the storage of a value inside `HashTable` will not be very
different from `Vec` besides being inline?
I think for the `value <= 64bits`, storing it inside `HashTable`:
- Can decrease the size of `HashTable` comparing to store their hashes?
- And when probing and rehashing of `HashTable` totally in-place, and
eliminting the random read to a the outside `Vec`?
But I am indeed not sure, for the large primitive, if doing this can get
benefit, because the storing them inline `HashTable` will make the table larger
than storing their hashes...
Also comparing them directly is more expansive than comparing their hashes...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101880510
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
Hm I think the storage of a value inside `HashTable` will not be very
different from `Vec` besides being inline?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101732232
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
> Why use large primitive? Is 128 bits slower otherwise?
π€ I currently decide which type to use `GroupValuesLargePrimitive`,
according to if its bits > 64.
Because the `hash` is 64bits, when bits > 64, the `map` will get larger, and
I am not sure if it is still better to go the new way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101741744
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -130,15 +133,15 @@ where
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
Review Comment:
I think for the `large`(> 64bits), we can do that to improve.
But for `normal`(<= 64bits), the new way will be better, I am testing it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2900067762 > π€: Benchmark completed Thanks, q4 and q15 are the target, and it seems indeed get faster! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101735859
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
I want to try it, too.
But I think it will lead to regression, becasue many extra random writes
exist in `emit` if we do that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101735859
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
I want to try it, too.
But I think it will lead to regression, becasue many extra random writes
exist in `emit`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Rachelint commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101732218
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
> Why use large primitive? Is 128 bits slower otherwise?
π€ I currently decide which type to use `GroupValuesLargePrimitive`,
according to if its bits > 64.
Because the `hash` is 64bits, when bits > 64, the `map` will get larger, and
I am not sure if it is still better to go the new way.
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
> Why use large primitive? Is 128 bits slower otherwise?
π€ I currently decide which type to use `GroupValuesLargePrimitive`,
according to if its bits > 64.
Because the `hash` is 64bits, when bits > 64, the `map` will get larger, and
I am not sure if it is still better to go the new way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
Something like:
```diff
diff --git
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
index 693cc997fa..600287486d 100644
---
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
+++
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
@@ -25,7 +25,6 @@ use arrow::array::{
use arrow::datatypes::{i256, DataType};
use arrow::record_batch::RecordBatch;
use datafusion_common::Result;
-use datafusion_execution::memory_pool::proxy::VecAllocExt;
use datafusion_expr::EmitTo;
use half::f16;
use hashbrown::hash_table::HashTable;
@@ -33,7 +32,6 @@ use std::mem::size_of;
use std::sync::Arc;
mod large_primitive;
-pub use large_primitive::GroupValuesLargePrimitive;
/// A trait to allow hashing of floating point numbers
pub(crate) trait HashValue {
@@ -94,8 +92,6 @@ pub struct GroupValuesPrimitive {
map: HashTable<(usize, T::Native)>,
/// The group index of the null value if any
null_group: Option,
-/// The values for each group index
-values: Vec,
/// The random state used to generate hashes
random_state: RandomState,
}
@@ -106,7 +102,6 @@ impl GroupValuesPrimitive {
Self {
data_type,
map: HashTable::with_capacity(128),
-values: Vec::with_capacity(128),
null_group: None,
random_state: Default::default(),
}
@@ -124,13 +119,14 @@ where
for v in cols[0].as_primitive::() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = self.map.len();
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
+let group_id = self.map.len();
+
let insert = self.map.entry(
hash,
|&(_, v)| v.is_eq(key),
@@ -140,10 +136,8 @@ where
match insert {
hashbrown::hash_table::Entry::Occupied(o) =>
o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, key));
-self.values.push(key);
-g
+v.insert((group_id, key));
+group_id
}
}
}
@@ -155,21 +149,19 @@ where
fn size(&self) -> usize {
self.map.capacity() * size_of::<(usize, T::Native)>()
-+ self.values.allocated_size()
}
fn is_empty(&self) -> bool {
-self.values.is_empty()
+self.map.is_empty()
}
fn len(&self) -> usize {
-self.values.len()
+self.map.len()
}
fn emit(&mut self, emit_to: EmitTo) -> Result> {
-emit_internal::(
+emit_internal::(
emit_to,
-&mut self.values,
&mut self.null_group,
&mu
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2898865038 π€ `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing improve-primitive-group-values (8c05f694d6849a364bfb78400f06b5b4ca14c02c) to 07fe23f03dcb53b2715d3f3b3e0c476d54202c00 [diff](https://github.com/apache/datafusion/compare/07fe23f03dcb53b2715d3f3b3e0c476d54202c00..8c05f694d6849a364bfb78400f06b5b4ca14c02c) Benchmarks: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
Something like:
```diff
diff --git
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
index 693cc997fa..c166c16f8e 100644
---
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
+++
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
@@ -25,7 +25,6 @@ use arrow::array::{
use arrow::datatypes::{i256, DataType};
use arrow::record_batch::RecordBatch;
use datafusion_common::Result;
-use datafusion_execution::memory_pool::proxy::VecAllocExt;
use datafusion_expr::EmitTo;
use half::f16;
use hashbrown::hash_table::HashTable;
@@ -33,7 +32,6 @@ use std::mem::size_of;
use std::sync::Arc;
mod large_primitive;
-pub use large_primitive::GroupValuesLargePrimitive;
/// A trait to allow hashing of floating point numbers
pub(crate) trait HashValue {
@@ -94,8 +92,6 @@ pub struct GroupValuesPrimitive {
map: HashTable<(usize, T::Native)>,
/// The group index of the null value if any
null_group: Option,
-/// The values for each group index
-values: Vec,
/// The random state used to generate hashes
random_state: RandomState,
}
@@ -106,7 +102,6 @@ impl GroupValuesPrimitive {
Self {
data_type,
map: HashTable::with_capacity(128),
-values: Vec::with_capacity(128),
null_group: None,
random_state: Default::default(),
}
@@ -124,13 +119,14 @@ where
for v in cols[0].as_primitive::() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = self.map.len();
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
+let group_id = self.map.len();
+
let insert = self.map.entry(
hash,
|&(_, v)| v.is_eq(key),
@@ -140,10 +136,8 @@ where
match insert {
hashbrown::hash_table::Entry::Occupied(o) =>
o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, key));
-self.values.push(key);
-g
+v.insert((group_id, key));
+group_id
}
}
}
@@ -155,21 +149,19 @@ where
fn size(&self) -> usize {
self.map.capacity() * size_of::<(usize, T::Native)>()
-+ self.values.allocated_size()
}
fn is_empty(&self) -> bool {
-self.values.is_empty()
+self.map.is_empty()
}
fn len(&self) -> usize {
-self.values.len()
+self.map.len()
}
fn emit(&mut self, emit_to: EmitTo) -> Result> {
-emit_internal::(
+emit_internal::(
emit_to,
-&mut self.values,
&mut self.null_group,
&mu
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2898960798 π€: Benchmark completed Details ``` Comparing HEAD and improve-primitive-group-values Benchmark clickbench_extended.json ββββ³β³β³ββββ β Queryβ HEAD β improve-primitive-group-values βChange β β‘βββββββββ© β QQuery 0 β 2121.49ms β 1919.65ms β +1.11x faster β β QQuery 1 β 740.28ms β 713.71ms β no change β β QQuery 2 β 1498.69ms β 1414.88ms β +1.06x faster β β QQuery 3 β 695.23ms β 717.43ms β no change β β QQuery 4 β 1471.01ms β 1474.66ms β no change β β QQuery 5 β 15267.11ms β 15054.97ms β no change β β QQuery 6 β 2081.23ms β 2106.01ms β no change β β QQuery 7 β 2141.68ms β 2088.73ms β no change β β QQuery 8 β 2008.35ms β 2070.93ms β no change β ββββ΄β΄β΄ββββ βββββ³β β Benchmark Summary ββ β‘βββββ© β Total Time (HEAD) β 28025.07ms β β Total Time (improve-primitive-group-values) β 27560.97ms β β Average Time (HEAD) β 3113.90ms β β Average Time (improve-primitive-group-values) β 3062.33ms β β Queries Fasterβ 2 β β Queries Slowerβ 0 β β Queries with No Changeβ 7 β βββββ΄β Benchmark clickbench_partitioned.json ββββ³β³β³ββββ β Queryβ HEAD β improve-primitive-group-values βChange β β‘βββββββββ© β QQuery 0 β 2.69ms β 2.18ms β +1.23x faster β β QQuery 1 β37.02ms β37.08ms β no change β β QQuery 2 β88.75ms β93.11ms β no change β β QQuery 3 β98.86ms β 101.96ms β no change β β QQuery 4 β 578.77ms β 546.64ms β +1.06x faster β β QQuery 5 β 847.22ms β 889.08ms β no change β β QQuery 6 β 2.32ms β 2.25ms β no change β β QQuery 7 β40.97ms β41.98ms β no change β β QQuery 8 β 897.48ms β 914.82ms β no change β β QQuery 9 β 1197.23ms β 1208.45ms β no change β β QQuery 10β 271.49ms β 266.82ms β no change β β QQuery 11β 302.44ms β 300.03ms β no change β β QQuery 12β 890.12ms β 920.98ms β no change β β QQuery 13β 1356.21ms β 1390.50ms β no change β β QQuery 14β 829.07ms β 867.85ms β no change β β QQuery 15β 819.53ms β 745.34ms β +1.10x faster β β QQuery 16β 1703.44ms β 1735.34ms β no change β β QQuery 17β 1586.44ms β 1598.57ms β no change β β QQuery 18β 3046.16ms β 3127.48ms β no change β β QQuery 19β84.86ms β89.24ms β 1.05x slower β β QQuery 20β 1154.87ms β 1172.21ms β no change β β QQuery 21β 1348.35ms β 1369.51ms β no change β β QQuery 22β 2274.86ms β 2274.24ms β no change β β QQuery 23β 8334.33ms β 8396.84ms β no change β β QQuery 24β 468.77ms β 471.73ms β no change β β QQuery 25β 407.86ms β 400.50ms β no change β β QQuery 26β 531.96ms β 536.71ms β no change β β QQuery 27β 1594.67ms β 1618.85ms β no change β β QQuery 28β 12418.63ms β 12463.37ms β no change β β QQuery 29β 543.60ms β 533.05ms β no change β β QQuery 30β 820.78ms β 82
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100966267 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -222,3 +184,61 @@ where self.map.shrink_to(count, |_| 0); // hasher does not matter since the map is cleared } } + +pub(crate) fn emit_internal( Review Comment: K is unneeded here, it can take `emit_internal` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
Something like:
```diff
diff --git
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
index 693cc997fa..600287486d 100644
---
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
+++
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
@@ -25,7 +25,6 @@ use arrow::array::{
use arrow::datatypes::{i256, DataType};
use arrow::record_batch::RecordBatch;
use datafusion_common::Result;
-use datafusion_execution::memory_pool::proxy::VecAllocExt;
use datafusion_expr::EmitTo;
use half::f16;
use hashbrown::hash_table::HashTable;
@@ -33,7 +32,6 @@ use std::mem::size_of;
use std::sync::Arc;
mod large_primitive;
-pub use large_primitive::GroupValuesLargePrimitive;
/// A trait to allow hashing of floating point numbers
pub(crate) trait HashValue {
@@ -94,8 +92,6 @@ pub struct GroupValuesPrimitive {
map: HashTable<(usize, T::Native)>,
/// The group index of the null value if any
null_group: Option,
-/// The values for each group index
-values: Vec,
/// The random state used to generate hashes
random_state: RandomState,
}
@@ -106,7 +102,6 @@ impl GroupValuesPrimitive {
Self {
data_type,
map: HashTable::with_capacity(128),
-values: Vec::with_capacity(128),
null_group: None,
random_state: Default::default(),
}
@@ -124,13 +119,14 @@ where
for v in cols[0].as_primitive::() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = self.map.len();
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
+let group_id = self.map.len();
+
let insert = self.map.entry(
hash,
|&(_, v)| v.is_eq(key),
@@ -140,10 +136,8 @@ where
match insert {
hashbrown::hash_table::Entry::Occupied(o) =>
o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, key));
-self.values.push(key);
-g
+v.insert((group_id, key));
+group_id
}
}
}
@@ -155,21 +149,19 @@ where
fn size(&self) -> usize {
self.map.capacity() * size_of::<(usize, T::Native)>()
-+ self.values.allocated_size()
}
fn is_empty(&self) -> bool {
-self.values.is_empty()
+self.map.is_empty()
}
fn len(&self) -> usize {
-self.values.len()
+self.map.len()
}
fn emit(&mut self, emit_to: EmitTo) -> Result> {
-emit_internal::(
+emit_internal::(
emit_to,
-&mut self.values,
&mut self.null_group,
&mu
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100975676
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
Something like:
```diff
diff --git
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
index 693cc997fa..600287486d 100644
---
i/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
+++
w/datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs
@@ -25,7 +25,6 @@ use arrow::array::{
use arrow::datatypes::{i256, DataType};
use arrow::record_batch::RecordBatch;
use datafusion_common::Result;
-use datafusion_execution::memory_pool::proxy::VecAllocExt;
use datafusion_expr::EmitTo;
use half::f16;
use hashbrown::hash_table::HashTable;
@@ -33,7 +32,6 @@ use std::mem::size_of;
use std::sync::Arc;
mod large_primitive;
-pub use large_primitive::GroupValuesLargePrimitive;
/// A trait to allow hashing of floating point numbers
pub(crate) trait HashValue {
@@ -94,8 +92,6 @@ pub struct GroupValuesPrimitive {
map: HashTable<(usize, T::Native)>,
/// The group index of the null value if any
null_group: Option,
-/// The values for each group index
-values: Vec,
/// The random state used to generate hashes
random_state: RandomState,
}
@@ -106,7 +102,6 @@ impl GroupValuesPrimitive {
Self {
data_type,
map: HashTable::with_capacity(128),
-values: Vec::with_capacity(128),
null_group: None,
random_state: Default::default(),
}
@@ -124,13 +119,14 @@ where
for v in cols[0].as_primitive::() {
let group_id = match v {
None => *self.null_group.get_or_insert_with(|| {
-let group_id = self.values.len();
-self.values.push(Default::default());
+let group_id = self.map.len();
group_id
}),
Some(key) => {
let state = &self.random_state;
let hash = key.hash(state);
+let group_id = self.map.len();
+
let insert = self.map.entry(
hash,
|&(_, v)| v.is_eq(key),
@@ -140,10 +136,8 @@ where
match insert {
hashbrown::hash_table::Entry::Occupied(o) =>
o.get().0,
hashbrown::hash_table::Entry::Vacant(v) => {
-let g = self.values.len();
-v.insert((g, key));
-self.values.push(key);
-g
+v.insert((group_id, key));
+group_id
}
}
}
@@ -155,21 +149,19 @@ where
fn size(&self) -> usize {
self.map.capacity() * size_of::<(usize, T::Native)>()
-+ self.values.allocated_size()
}
fn is_empty(&self) -> bool {
-self.values.is_empty()
+self.map.is_empty()
}
fn len(&self) -> usize {
-self.values.len()
+self.map.len()
}
fn emit(&mut self, emit_to: EmitTo) -> Result> {
-emit_internal::(
+emit_internal::(
emit_to,
-&mut self.values,
&mut self.null_group,
&mu
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100768086
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/large_primitive.rs:
##
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::aggregates::group_values::single_group_by::primitive::{
+emit_internal, HashValue,
+};
+use crate::aggregates::group_values::GroupValues;
+use ahash::RandomState;
+use arrow::array::{
+cast::AsArray, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType,
PrimitiveArray,
+};
+use arrow::datatypes::DataType;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::proxy::VecAllocExt;
+use datafusion_expr::EmitTo;
+use hashbrown::hash_table::HashTable;
+use std::mem::size_of;
+
+/// A [`GroupValues`] storing a single column of large primitive values (bits
> 64)
+///
+/// This specialization is significantly faster than using the more general
+/// purpose `Row`s format
+pub struct GroupValuesLargePrimitive {
Review Comment:
Why use large primitive? Is 128 bits slower otherwise?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100764581
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -130,15 +133,15 @@ where
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
Review Comment:
I wonder if the existing performance couldn't already be improved by doing
`h == hash && unsafe { self.values.get_unchecked(g).is_eq(key) }` (i.e.
avoiding the fetch if hash equality doesn't hold), like we do in other places.
This already filters out most of them as hashes are almost never the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100764581
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -130,15 +133,15 @@ where
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
Review Comment:
I wonder if the existing performance couldn't already be improved by doing
`h == hash && unsafe { self.values.get_unchecked(g).is_eq(key) }` (i.e.
avoiding fetching the value if hash equality doesn't hold), like we do in other
places. This already filters out most of them as hashes are almost never the
same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100764581
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -130,15 +133,15 @@ where
let hash = key.hash(state);
let insert = self.map.entry(
hash,
-|&(g, _)| unsafe {
self.values.get_unchecked(g).is_eq(key) },
Review Comment:
I wonder if the existing performance couldn't already be improved by doing
`h == hash && self.values.get_unchecked(g).is_eq(key)` (i.e. avoiding the fetch
if hash equality doesn't hold), like we do in other places. This already
filters out most of them as hashes are almost never the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100719962
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
Is storing `values: Vec` in this case necessary?
It could be rebuild in `emit_internal` by traversing the map items I think
(create a `Vec` and update by group index). This avoids doing that while
inserting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]
Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100719962
##
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##
@@ -74,21 +77,21 @@ macro_rules! hash_float {
hash_float!(f16, f32, f64);
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits
<= 64)
///
/// This specialization is significantly faster than using the more general
/// purpose `Row`s format
pub struct GroupValuesPrimitive {
/// The data type of the output array
data_type: DataType,
-/// Stores the `(group_index, hash)` based on the hash of its value
+/// Stores the `(group_index, group_value)`
///
-/// We also store `hash` is for reducing cost of rehashing. Such cost
-/// is obvious in high cardinality group by situation.
+/// We directly store copy of `group_value` for not only efficient
+/// rehashing, but also efficient probing.
/// More details can see:
-///
+///
///
-map: HashTable<(usize, u64)>,
+map: HashTable<(usize, T::Native)>,
Review Comment:
Is storing `values: Vec` in this case necessary?
It could be rebuild in `emit_internal` by traversing the map items I think
(create a `Vec` and update by group index). This avoids doing that while
inserting / updating.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
