Re: Long text and complex data types support

Dmitry Degrave Thu, 12 Sep 2019 05:25:07 -0700

> Dmitry, Would you be interested in writing up more details about how you
are using Kudu in a blog post or even a mailing list email? This sounds
super interesting.


I wrote about our Kudu cluster and ETL in posts below - there are
unexplained results wrt resource utilization with multiple tservers:

https://lists.apache.org/thread.html/2b94912dc4a251312000dbd6df2d31c43029723e16d50ffd6e510c90@%3Cuser.kudu.apache.org%3E

~dmitry

On Thu, 12 Sep 2019 at 11:28, Grant Henke <ghe...@cloudera.com> wrote:

> Thanks for the information Dmitry and Mauricio!
>
> An example from genomics.
>>
>
> Dmitry, Would you be interested in writing up more details about how you
> are using Kudu in a blog post or even a mailing list email? This sounds
> super interesting.
>
> Supporting serialized objects (e.g. java's hashtables with
>> capabilities to select only rows with hashtables containing some
>> specific keys) would make Kudu super-special ;)
>>
>
> I agree supporting something like this would be very cool.
>
> Would be good if Kudu supported the way Impala can store and query nested
>> data
>>
>
> Supporting Impala's syntax on Kudu tables with complex types is absolutely
> a priority.
>
> Thanks,
> Grant
>
> On Wed, Sep 11, 2019 at 7:04 PM Mauricio Aristizabal <mauri...@impact.com>
> wrote:
>
>> Would be good if Kudu supported the way Impala can store and query nested
>> data in hdfs/parquet, so it would be (at least mostly) transparent to query
>> nested data in either storage engine.  We recently had a use for this
>> (basically storing N order item details along with each order record) but
>> decided against it because we know we'll be moving that table from Parquet
>> to Kudu soon.
>>
>> On Wed, Sep 11, 2019 at 1:49 PM Dmitry Degrave <dmee...@gmail.com> wrote:
>>
>>> Hi Grant,
>>>
>>> An example from genomics. Current scheme is simple [1] (denormalized
>>> for performance), but requires N = S * V rows in genotype table (S is
>>> number of samples, V is average number of variants in a sample,
>>> typical value for WGS V=5*10^6 and we deal with tens of thousands of
>>> samples). More optimal scheme would keep all variants of a sample in a
>>> single row, which is impossible now.
>>>
>>> Supporting nested data structures, e.g. similar to implemented in
>>> ClickHouse [2], would be useful too.
>>>
>>> Supporting serialized objects (e.g. java's hashtables with
>>> capabilities to select only rows with hashtables containing some
>>> specific keys) would make Kudu super-special ;)
>>>
>>> ~dmitry
>>>
>>> [1] https://gist.github.com/dnafault/e55ea987c55d2960c738d94e4811d043
>>> [2]
>>> https://clickhouse-docs.readthedocs.io/en/latest/data_types/nested_data_structures/nested.html
>>>
>>> On Mon, 9 Sep 2019 at 08:18, Grant Henke <ghe...@cloudera.com> wrote:
>>> >
>>> > Hi Boris,
>>> >
>>> > Can you describe in more detail what exactly you are looking for in a
>>> long text type? Is there another database that has an equivalent type for
>>> reference?
>>> >
>>> > I have started looking at complex type support and plan to put up a
>>> design document soon. No estimates on when it would be complete or how much
>>> work is required exists yet. Do you have any sample schemas with complex
>>> types you could send me to help inform designs and trade offs?
>>> >
>>> > Thank you,
>>> > Grant
>>> >
>>> > On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <bo...@boristyukin.com>
>>> wrote:
>>> >>
>>> >> Hi guys,
>>> >>
>>> >> Any plans to support long text type in Kudu? We would love to use
>>> Kudu with other projects but unfortunately long text data are pretty common
>>> in healthcare industry and we have to use hive/Impala/hdfs instead which is
>>> quite painful since we cannot do in place updates and deletes.
>>> >>
>>> >> Same question about complex types (arrays, maps etc.)
>>> >>
>>> >> Thanks
>>> >
>>> >
>>> >
>>> > --
>>> > Grant Henke
>>> > Software Engineer | Cloudera
>>> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>>
>>
>>
>> --
>> Mauricio Aristizabal
>> Architect - Data Pipeline
>> mauri...@impact.com | 323 309 4260
>> https://impact.com
>> <https://www.linkedin.com/company/impact-martech/>
>> <https://www.facebook.com/ImpactParTech/>
>> <https://twitter.com/impactpartech>
>> <https://www.youtube.com/c/impactmartech>
>>
>>
>>
>> <http://go.impact.com/WR-PC-AW-DiscoveringGrowthThroughPartnerships.html?utm_medium=owned-email-send&utm_source=sigsatori&utm_campaign=webinarreg-201909-discoveringgrowth-pc>
>>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>

Re: Long text and complex data types support

Reply via email to