Hi Edward, We are currently working on integrating Apache Iceberg tables to Hive. In the latest released Hive 4.0.0-alpha-1 it is possible to create tables backed by Iceberg tables, and those could be queried by Hive. You can define the partitioning using Iceberg specification like this:
CREATE EXTERNAL TABLE ice_table (id bigint, year_field date) PARTITIONED BY SPEC (year(year_field)) STORED BY ICEBERG; These partitions will be handled by Iceberg, and in the HMS no partitions are stored. This will remove a serious part of the load from the HMS and will allow higher number of partitions for a single table. Also there is an ongoing work at the Impala project to read/write these Hive-Iceberg tables. https://blog.cloudera.com/introducing-apache-iceberg-in-cloudera-data-platform/ <https://blog.cloudera.com/introducing-apache-iceberg-in-cloudera-data-platform/> I hope this helps, Peter > On 2022. Apr 2., at 23:33, Edward Capriolo <edlinuxg...@gmail.com> wrote: > > While not active in the development community as much I have been using hive > in the field as well as spark and impala for some time. > > My ancmecdotal opinion is that the current metastore needs a significant re > write to deal with "next generation" workloads. By next generation I actually > mean last generation. > > Currently cloudera's impala advice is . No more then 1k rows in table. And > tables with lots of partitions are problematic. > > Thus really "wont get it done" at the "new" web scale. Hive server can have > memory problems with tables with 2k columns and 5k partitions. > > It feels like design ideas like "surely we can fetch all the columns of a > table in one go' dont make sense universally. > > Amazon has glue which can scale to amazon scale. Hive metastore cant even > really scale to q single organization. So what are the next steps, I dont > think its simple as "move it to nosql" I think it has to be reworked from > ground up. > > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual.