Re: [E] Re: Apache DataSketches integration

2021-08-27 Thread Alexander Saydakov
I submitted a pull request with some changes I tried to explain here. https://github.com/apache/impala/pull/30 There are still open questions for me regarding: - better dependency mechanism - updating dependency to the latest 3.1.0 - process flow in aggregate functions (avoiding overhead of pairwi

Re: [E] Re: Apache DataSketches integration

2021-08-24 Thread Alexander Saydakov
I am afraid that I was misunderstood regarding a few points. Let me try to clarify. Regarding serialization using bytes as opposed to a stream. This has nothing to do with BINARY data type in Impala. Currently I see in the Impala code something like this (simplified): std::stringstream tmp; sketch

Re: Apache DataSketches integration

2021-08-18 Thread Gabor Kaszab
Hey Quanlong, Initially I added the first library as copying all the required files into the same dir. It was ~1.5 years ago so my memories are faint but as I remember there was an issue with the library having compilation issues if we kept the structure of the files. As a Quick workaround came the

Re: [E] Re: Apache DataSketches integration

2021-08-16 Thread Alexander Saydakov
I am away for a few days. I will have a look soon. Thank you. On Mon, Aug 16, 2021 at 9:44 AM Quanlong Huang wrote: > Thank Fucun for creating the JIRAs! > > Regarding the dependency. I see that the current approach is to copy all >> files from Datasketches into a single pile. Is there a better

Re: Apache DataSketches integration

2021-08-16 Thread Quanlong Huang
Thank Fucun for creating the JIRAs! Regarding the dependency. I see that the current approach is to copy all > files from Datasketches into a single pile. Is there a better way? Is there a historical reason that we don't add DataSketches into the native-toolchain? Regards, Quanlong On Mon, Aug

Re: Apache DataSketches integration

2021-08-16 Thread fucun chu
Hi, 1. Upgrade DataSketches to version 3.1.0 Issue tracking: https://issues.apache.org/jira/browse/IMPALA-10863 2. Use bytes to serializing sketches Impala currently does not support the BINARY data type, we can write sketches as binary instead of strings once it's supported: https://issues.apach

Apache DataSketches integration

2021-08-12 Thread Alexander Saydakov
Hi Impala development community, I am a member of Apache DataSketches team. I looked at the DataSketches functions in Impala and I have a few questions and suggestions. Regarding the dependency. I see that the current approach is to copy all files from Datasketches into a single pile. Is there a b