Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
andygrove closed issue #1389: AQE may materialize a non-supported Final-mode HashAggregate URL: https://github.com/apache/datafusion-comet/issues/1389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
EmilyMatt commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-3647036600 > [@EmilyMatt](https://github.com/EmilyMatt), now that I have worked on this and become more familiar with the issue, could you confirm that I understand it correctly? > > If we have a Comet final and Spark partial agg, or a Spark final and Comet partial agg, we can experience runtime errors if the intermediate buffer representation isn't compatible. In those cases, we should either a) fall back to Spark for both, or b) add additional projections or take other measures to make it still work. Am I understanding this correctly? Indeed^ The better solution imo is to just separate the result expressions into a Spark ProjectExec, and have the conversion after the aggreagte(assuming that our aggregate is better than Spark ;) ) The conversion will also have less rows because of the group-by, which seems more optimal all-around. I can give this a go again if you'd like -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
andygrove commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-3646870951 @EmilyMatt, now that I have worked on this and become more familiar with the issue, could you confirm that I understand it correctly? If we have a Comet final and Spark partial agg, or a Spark final and Comet partial agg, we can experience runtime errors if the intermediate buffer representation isn't compatible. In those cases, we should either a) fall back to Spark for both, or b) add additional projections or take other measures to make it still work. Am I understanding this correctly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
andygrove commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-3634101239 I added a repro in https://github.com/apache/datafusion-comet/pull/2863 and I plan on implementing a fix in `CometExecRule`. My plan is to walk the plan top to bottom and when we see a final aggregate that is not supported then we walk down to the child partial and tag it with a fallback reason. This will need to happen before replacing any execs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
coderfender commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-3002064952 As per @EmilyMatt ' s comment on the related PR ,this seems to be no longer a bug but a performance boost. @andygrove we might have to remove `bug` label and remark it as `perf` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
coderfender commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-324166 Working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
EmilyMatt commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2661320829 The issue can be dissected as this: a. There is no reason to have a Partial aggregate and not a Final one, regardless of shuffle, if we support the aggregate expressions, why should we run the Final stage in Spark if it is supported? we should get the performance benefit of running the final aggregate in native as well, and if something can run in Spark and the conversion has to be made, it should be only the result expressions. This is the less critical aspect. b. Currently, all the supported aggregate expressions are "naive", as in, they result in some sort of Spark primitive, and they are performed using a primitive aggregate, even ones such as Avg are represented as "Sum" and "Count", but what happens when Spark uses an aggregate buffer that it passes between stages? such as in the case of CollectSet and CollectList. Despite the expected result probably being an ArrayType(StringType, false), for example, Spark will pass the serialized intermediate representation between stages, which can be observed as the DataType is BinaryType(the attribute is usually called "buf#" or something). When we reach something like this, using columnar shuffle, the shuffle will throw an error, because we will output an unexpected intermediate type(ArrayType of non-nullable string), while it expects a BinaryType. Let's say we fixed this by iterating over the outputs in CometHashAggregateExec, and correcting the DataType to what Comet outputs in the previous stage, now the shuffle will work correctly, as it will call the correct method on the ColumnarBatch and write it to the shuffle file. However, the Final stage HashAggregate, will still attempt to read the aggregate buffer it expects(binary), not an intermediate value of the result type. This will cause a crash then. This cannot be expected before stage materialization as the plan doesn't exist yet and we don't know if the result expressions are not supported, so I think allowing a Spark Final stage with a Comet partial stage is slightly shortsighted, unless in the future we write a conversion(maybe using codegen) between the Comet output and a Spark aggregate binary representation. I think a better solution is the separation of the result expressions in cases where they are not supported. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
kazuyukitanimura commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660587673 Thanks @viirya @EmilyMatt Did you mean that we should be able to run Comet aggregation even Comet shuffle is disabled by > I believe I've seen a few tests that are ignored because of this. > I don't think this is a valid situation, We should not crash based on previously Comet-ran operators if they were successful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
viirya commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660573325 Because this? ``` // When Comet shuffle is disabled, we don't want to transform the HashAggregate // to CometHashAggregate. Otherwise, we probably get partial Comet aggregation // and final Spark aggregation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]
kazuyukitanimura commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660567872 cc @viirya I forgot why we did this in #991 https://github.com/apache/datafusion-comet/blob/f099e6e40aa18441c7882e5bffd9d6dfb10c6c19/spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala#L491-L494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
