Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2620036711

   And I broke the build 🤦 . Fix PR:
   - https://github.com/apache/datafusion/pull/14345


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2619977704

   WFT let's do it and keep things moving
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub


alamb merged PR #14074:
URL: https://github.com/apache/datafusion/pull/14074


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2619977143

   > Any other blockers @alamb ? Thanks for hustling this through
   
   I am somewhat overwhelmed with 
   - https://github.com/apache/datafusion/issues/14008
   (and also 
   - https://github.com/apache/arrow-rs/issues/6929 / 
https://github.com/apache/arrow-rs/issues/7034
   
   And I haven't had a chance to fully think about downstream implications of 
this PR / have the bandwidth yet to pull the trigger and add potentially some 
other issues to the 45 release
   
   
   So no blockers from me yet, I was just hadn't gotten up the guts to merge it 
yet
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub


ozankabak commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2619923329

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub


gatesn commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2619891822

   Any other blockers @alamb ? Thanks for hustling this through


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2611215362

   I merged this branch up from main and triggered the CI again. If there are 
no additional concerns I hope to merge this in a day or two


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub


berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609448004

   > Statistics can be helpful for optimizer rules, but they also allow 
short-circuiting computations. For example, min/max can be used to avoid 
evaluating a filter over a record batch and quickly throw away the whole thing.
   > 
   > Sum statistics help with short-circuiting aggregation functions. For 
example, `SELECT SUM(a) FROM foo` becomes a constant time operation. Similarly, 
`AVG(a)` can be computed with `sum / row count`.
   > 
   > > Why cannot you just use an AggregateExec having a sum accumulator?
   > 
   > Because our file format already stores a pre-computed sum.
   
   Thanks for the explanation. I see the reason now, and it makes sense. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub


gatesn commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609426600

   Statistics can be helpful for optimizer rules, but they also allow 
short-circuiting computations. For example, min/max can be used to avoid 
evaluating a filter over a record batch and quickly throw away the whole thing.
   
   Sum statistics help with short-circuiting aggregation functions. For 
example, `SELECT SUM(a) FROM foo` becomes a constant time operation. Similarly, 
`AVG(a)` can be computed with `sum / row count`.
   
   > Why cannot you just use an AggregateExec having a sum accumulator?
   
   Because our file format already stores a pre-computed sum.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub


berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609417451

   > I can't think of any other statistical quantities that would immediately 
help operators, so from our perspective it's only "sum" (we may also use sum to 
mean true-count for booleans).
   > 
   > If this lands I can follow up with a PR to start using it in SUM, AVG 
operators. I guess the more contentious API change was adding 
`compute_statistics` to the `Expr` trait: 
https://github.com/apache/datafusion/pull/13736/files#diff-2b3f5563d9441d3303b57e58e804ab07a10d198973eed20e7751b5a20b955e42R156-R158
   > 
   > @berkaysynnada is this something that would also remain compatible with 
the V2 API? I believe it is
   
   
   What I know is the whole statistics concept was created and used because of 
helping some optimization decisions, informing the optimizer rules about the 
data that comes to any execution plan node. What I couldn't understand is how 
"sum" information is helpful in any kind of optimization process.
   
   > to start using it in SUM, AVG operators
   
   Please correct me if I get wrongly your intention within this and 
https://github.com/apache/datafusion/pull/13736, you propose to add this "sum" 
info to get a result from it as a normal batch data?
   
   As I said, the V2 API does nothing to which kind of statistics will be 
preserved in Statistics{} struct, it is more about consolidating the Precision 
and Interval objects to represent and compute any kind of statistical quantity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub


gatesn commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609166991

   I can't think of any other statistical quantities that would immediately 
help operators, so from our perspective it's only "sum" (we may also use sum to 
mean true-count for booleans).
   
   If this lands I can follow up with a PR to start using it in SUM, AVG 
operators. I guess the more contentious API change was adding 
`compute_statistics` to the `Expr` trait: 
https://github.com/apache/datafusion/pull/13736/files#diff-2b3f5563d9441d3303b57e58e804ab07a10d198973eed20e7751b5a20b955e42R156-R158
   
   @berkaysynnada is this something that would also remain compatible with the 
V2 API? I believe it is


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub


berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609145891

   > @berkaysynnada can we merge this PR in now? Or shall we wait for the 
statistics revamp that is underway?
   
   No need to wait for underway PR as it does not depend which statistics an 
operator has. It is about how these statistics are stored, computed and used. 
   
   But still, I wonder if we're planning to support a wide variety of 
statistical quantities -- like sum -- or is there a specific set of statistics 
that can be inferred from the sources or have practical applications in 
optimizer rules?
   
   If we agree that extending column statistics in this way is both useful and 
feasible for any user, we can move forward with merging this. We’ll also make 
sure it’s integrated into the new setup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-22 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2608482804

   @berkaysynnada  can we merge this PR in now? Or shall we wait for the 
statistics revamp that is underway?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-15 Thread via GitHub


berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2594728046

   > Looks like I got hit by some new ColumnStatistics tests on main. Should be 
fixed now 🤞
   > 
   > @berkaysynnada can you expand on the rationale for the V2 stats? I 
understand that it's more expressive, but I can't see in the PR or Notion how 
those distributions might actually be used? Is this for join planning?
   > 
   > My understanding is I would no longer define a "min" or a "max" for a 
column. But there doesn't seem to be a place to define null count or sum?
   
   You can still define min or max. We are not replacing Statistics with 
Statistics_v2; it is actually replacing the Precision and Interval objects. We 
plan to rename the API of the execution plan from `fn statistics(&self) -> 
Statistics` to `fn statistics(&self) -> TableStatistics`, which is still 
structured as:
   ```
   pub struct TableStatistics {
   pub num_rows: Statistics,
   pub total_byte_size: Statistics,
   pub column_statistics: Vec,
   }
   ```
   and
   ```
   pub struct ColumnStatistics {
   pub null_count: Statistics,
   pub max_value: Statistics,
   pub min_value: Statistics,
   pub distinct_count: Statistics,
   }
   ```
   
   What we are trying to address is how the current way of indeterminate 
quantities are handled in a target-dependent way. For example, if there’s a 
possibility of indeterminate statistics, it is stored as the mean value when 
the caller requires an estimate. However, if bounds are required, that 
indeterminism is stored as an interval.
   
   Our goal is to consolidate all forms of indeterminism and structure them 
with a strong mathematical foundation. This way, every user can utilize the 
statistics in their intended way. We aim to preserve and sustain all possible 
helpful quantities wherever feasible.
   
   We are also constructing a robust evaluation and back-propagation mechanism 
(similar to interval arithmetic, evaluate_bounds, and propagate_constraints). 
With this mechanism, any kind of expression—whether projection-based 
(evaluation only) or filter-based (evaluation followed by propagation)—will 
automatically resolve using the new statistics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-15 Thread via GitHub


gatesn commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2592991088

   Looks like I got hit by some new ColumnStatistics tests on main. Should be 
fixed now 🤞 
   
   @berkaysynnada can you expand on the rationale for the V2 stats? I 
understand that it's more expressive, but I can't see in the PR or Notion how 
those distributions might actually be used? Is this for join planning?
   
   My understanding is I would no longer define a "min" or a "max" for a 
column. But there doesn't seem to be a place to define null count or sum?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-13 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2588553063

   > > > We've started to refactor. The design is complete, and the 
implementation is in progress.
   > > 
   > > 
   > > Thanks! Is there anywhere I can follow along @berkaysynnada (I am 
particularly interested in what the final API / representation looks like)
   > 
   > I've reached you via discord
   
   For anyone else who is interested, the draft PR in the synnada fork is here: 
   - https://github.com/synnada-ai/datafusion-upstream/pull/57


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-13 Thread via GitHub


berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2588148550

   > > We've started to refactor. The design is complete, and the 
implementation is in progress.
   > 
   > Thanks! Is there anywhere I can follow along @berkaysynnada (I am 
particularly interested in what the final API / representation looks like)
   
   I've reached you via discord


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-13 Thread via GitHub


alamb commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2588096257

   > We've started to refactor. The design is complete, and the implementation 
is in progress.
   
   Thanks! Is there anywhere I can follow along @berkaysynnada  (I am 
particularly interested in what the final API / representation looks like)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-12 Thread via GitHub


berkaysynnada commented on PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2585899506

   > FYI @suremarc @berkaysynnada / @ozankabak as this changes statistics and I 
think you are already working on things related to that:
   
   We've started to refactor. The design is complete, and the implementation is 
in progress.
   
   I’ve taken a look at this and have some questions. For example, are we 
planning to add many types of functions to statistics, or is there a defined 
list of statistics that can be inferred from the sources or have meaningful 
applications in optimizer rules? If we agree that these kinds of extensions to 
column statistics are indeed useful and obtainable, then we can proceed with 
merging this. We would also ensure it is included in the new setup.
   
   FYI @ozankabak 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-12 Thread via GitHub


gatesn commented on code in PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912472076


##
datafusion/common/src/stats.rs:
##
@@ -170,24 +170,63 @@ impl Precision {
 pub fn add(&self, other: &Precision) -> 
Precision {
 match (self, other) {
 (Precision::Exact(a), Precision::Exact(b)) => {
-if let Ok(result) = a.add(b) {
-Precision::Exact(result)
-} else {
-Precision::Absent
-}
+a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
 }
 (Precision::Inexact(a), Precision::Exact(b))
 | (Precision::Exact(a), Precision::Inexact(b))
-| (Precision::Inexact(a), Precision::Inexact(b)) => {
-if let Ok(result) = a.add(b) {
-Precision::Inexact(result)
-} else {
-Precision::Absent
-}
+| (Precision::Inexact(a), Precision::Inexact(b)) => a
+.add(b)
+.map(Precision::Inexact)
+.unwrap_or(Precision::Absent),
+(_, _) => Precision::Absent,
+}
+}
+
+/// Calculates the difference of two (possibly inexact) [`ScalarValue`] 
values,
+/// conservatively propagating exactness information. If one of the input
+/// values is [`Precision::Absent`], the result is `Absent` too.
+pub fn sub(&self, other: &Precision) -> 
Precision {
+match (self, other) {
+(Precision::Exact(a), Precision::Exact(b)) => {
+a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
 }
+(Precision::Inexact(a), Precision::Exact(b))
+| (Precision::Exact(a), Precision::Inexact(b))
+| (Precision::Inexact(a), Precision::Inexact(b)) => a
+.add(b)
+.map(Precision::Inexact)
+.unwrap_or(Precision::Absent),
+(_, _) => Precision::Absent,
+}
+}
+
+/// Calculates the multiplication of two (possibly inexact) 
[`ScalarValue`] values,
+/// conservatively propagating exactness information. If one of the input
+/// values is [`Precision::Absent`], the result is `Absent` too.
+pub fn multiply(&self, other: &Precision) -> 
Precision {
+match (self, other) {
+(Precision::Exact(a), Precision::Exact(b)) => a
+.mul_checked(b)
+.map(Precision::Exact)
+.unwrap_or(Precision::Absent),
+(Precision::Inexact(a), Precision::Exact(b))
+| (Precision::Exact(a), Precision::Inexact(b))
+| (Precision::Inexact(a), Precision::Inexact(b)) => a
+.mul_checked(b)
+.map(Precision::Inexact)
+.unwrap_or(Precision::Absent),
 (_, _) => Precision::Absent,
 }
 }
+
+/// Casts the value to the given data type, propagating exactness 
information.
+pub fn cast_to(&self, data_type: &DataType) -> 
Result> {

Review Comment:
   @alamb one question I have is whether this should return a Result, or we 
should assume that a failed cast implies overflow and therefore return 
`Precision::Absent`?
   
   The caller (currently in cross-join) unwraps to `Absent`, I just didn't know 
whether to internalize that here.
   
   Edit: I decided it was better to propagate the error and allow the caller to 
decide. It was more useful in a couple of places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-12 Thread via GitHub


gatesn commented on code in PR #14074:
URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912472076


##
datafusion/common/src/stats.rs:
##
@@ -170,24 +170,63 @@ impl Precision {
 pub fn add(&self, other: &Precision) -> 
Precision {
 match (self, other) {
 (Precision::Exact(a), Precision::Exact(b)) => {
-if let Ok(result) = a.add(b) {
-Precision::Exact(result)
-} else {
-Precision::Absent
-}
+a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
 }
 (Precision::Inexact(a), Precision::Exact(b))
 | (Precision::Exact(a), Precision::Inexact(b))
-| (Precision::Inexact(a), Precision::Inexact(b)) => {
-if let Ok(result) = a.add(b) {
-Precision::Inexact(result)
-} else {
-Precision::Absent
-}
+| (Precision::Inexact(a), Precision::Inexact(b)) => a
+.add(b)
+.map(Precision::Inexact)
+.unwrap_or(Precision::Absent),
+(_, _) => Precision::Absent,
+}
+}
+
+/// Calculates the difference of two (possibly inexact) [`ScalarValue`] 
values,
+/// conservatively propagating exactness information. If one of the input
+/// values is [`Precision::Absent`], the result is `Absent` too.
+pub fn sub(&self, other: &Precision) -> 
Precision {
+match (self, other) {
+(Precision::Exact(a), Precision::Exact(b)) => {
+a.add(b).map(Precision::Exact).unwrap_or(Precision::Absent)
 }
+(Precision::Inexact(a), Precision::Exact(b))
+| (Precision::Exact(a), Precision::Inexact(b))
+| (Precision::Inexact(a), Precision::Inexact(b)) => a
+.add(b)
+.map(Precision::Inexact)
+.unwrap_or(Precision::Absent),
+(_, _) => Precision::Absent,
+}
+}
+
+/// Calculates the multiplication of two (possibly inexact) 
[`ScalarValue`] values,
+/// conservatively propagating exactness information. If one of the input
+/// values is [`Precision::Absent`], the result is `Absent` too.
+pub fn multiply(&self, other: &Precision) -> 
Precision {
+match (self, other) {
+(Precision::Exact(a), Precision::Exact(b)) => a
+.mul_checked(b)
+.map(Precision::Exact)
+.unwrap_or(Precision::Absent),
+(Precision::Inexact(a), Precision::Exact(b))
+| (Precision::Exact(a), Precision::Inexact(b))
+| (Precision::Inexact(a), Precision::Inexact(b)) => a
+.mul_checked(b)
+.map(Precision::Inexact)
+.unwrap_or(Precision::Absent),
 (_, _) => Precision::Absent,
 }
 }
+
+/// Casts the value to the given data type, propagating exactness 
information.
+pub fn cast_to(&self, data_type: &DataType) -> 
Result> {

Review Comment:
   @alamb one question I have is whether this should return a Result, or we 
should assume that a failed cast implies overflow and therefore return 
`Precision::Absent`?
   
   The caller (currently in cross-join) unwraps to `Absent`, I just didn't know 
whether to internalize that here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]