> Dmitry, Would you be interested in writing up more details about how you are using Kudu in a blog post or even a mailing list email? This sounds super interesting.
I wrote about our Kudu cluster and ETL in posts below - there are unexplained results wrt resource utilization with multiple tservers: https://lists.apache.org/thread.html/2b94912dc4a251312000dbd6df2d31c43029723e16d50ffd6e510c90@%3Cuser.kudu.apache.org%3E ~dmitry On Thu, 12 Sep 2019 at 11:28, Grant Henke <ghe...@cloudera.com> wrote: > Thanks for the information Dmitry and Mauricio! > > An example from genomics. >> > > Dmitry, Would you be interested in writing up more details about how you > are using Kudu in a blog post or even a mailing list email? This sounds > super interesting. > > Supporting serialized objects (e.g. java's hashtables with >> capabilities to select only rows with hashtables containing some >> specific keys) would make Kudu super-special ;) >> > > I agree supporting something like this would be very cool. > > Would be good if Kudu supported the way Impala can store and query nested >> data >> > > Supporting Impala's syntax on Kudu tables with complex types is absolutely > a priority. > > Thanks, > Grant > > On Wed, Sep 11, 2019 at 7:04 PM Mauricio Aristizabal <mauri...@impact.com> > wrote: > >> Would be good if Kudu supported the way Impala can store and query nested >> data in hdfs/parquet, so it would be (at least mostly) transparent to query >> nested data in either storage engine. We recently had a use for this >> (basically storing N order item details along with each order record) but >> decided against it because we know we'll be moving that table from Parquet >> to Kudu soon. >> >> On Wed, Sep 11, 2019 at 1:49 PM Dmitry Degrave <dmee...@gmail.com> wrote: >> >>> Hi Grant, >>> >>> An example from genomics. Current scheme is simple [1] (denormalized >>> for performance), but requires N = S * V rows in genotype table (S is >>> number of samples, V is average number of variants in a sample, >>> typical value for WGS V=5*10^6 and we deal with tens of thousands of >>> samples). More optimal scheme would keep all variants of a sample in a >>> single row, which is impossible now. >>> >>> Supporting nested data structures, e.g. similar to implemented in >>> ClickHouse [2], would be useful too. >>> >>> Supporting serialized objects (e.g. java's hashtables with >>> capabilities to select only rows with hashtables containing some >>> specific keys) would make Kudu super-special ;) >>> >>> ~dmitry >>> >>> [1] https://gist.github.com/dnafault/e55ea987c55d2960c738d94e4811d043 >>> [2] >>> https://clickhouse-docs.readthedocs.io/en/latest/data_types/nested_data_structures/nested.html >>> >>> On Mon, 9 Sep 2019 at 08:18, Grant Henke <ghe...@cloudera.com> wrote: >>> > >>> > Hi Boris, >>> > >>> > Can you describe in more detail what exactly you are looking for in a >>> long text type? Is there another database that has an equivalent type for >>> reference? >>> > >>> > I have started looking at complex type support and plan to put up a >>> design document soon. No estimates on when it would be complete or how much >>> work is required exists yet. Do you have any sample schemas with complex >>> types you could send me to help inform designs and trade offs? >>> > >>> > Thank you, >>> > Grant >>> > >>> > On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <bo...@boristyukin.com> >>> wrote: >>> >> >>> >> Hi guys, >>> >> >>> >> Any plans to support long text type in Kudu? We would love to use >>> Kudu with other projects but unfortunately long text data are pretty common >>> in healthcare industry and we have to use hive/Impala/hdfs instead which is >>> quite painful since we cannot do in place updates and deletes. >>> >> >>> >> Same question about complex types (arrays, maps etc.) >>> >> >>> >> Thanks >>> > >>> > >>> > >>> > -- >>> > Grant Henke >>> > Software Engineer | Cloudera >>> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >>> >> >> >> -- >> Mauricio Aristizabal >> Architect - Data Pipeline >> mauri...@impact.com | 323 309 4260 >> https://impact.com >> <https://www.linkedin.com/company/impact-martech/> >> <https://www.facebook.com/ImpactParTech/> >> <https://twitter.com/impactpartech> >> <https://www.youtube.com/c/impactmartech> >> >> >> >> <http://go.impact.com/WR-PC-AW-DiscoveringGrowthThroughPartnerships.html?utm_medium=owned-email-send&utm_source=sigsatori&utm_campaign=webinarreg-201909-discoveringgrowth-pc> >> > > > -- > Grant Henke > Software Engineer | Cloudera > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >