Hi Edward,

We are currently working on integrating Apache Iceberg tables to Hive.
In the latest released Hive 4.0.0-alpha-1 it is possible to create tables 
backed by Iceberg tables, and those could be queried by Hive. You can define 
the partitioning using Iceberg specification like this:

CREATE EXTERNAL TABLE ice_table (id bigint, year_field date) PARTITIONED BY 
SPEC (year(year_field)) STORED BY ICEBERG;

These partitions will be handled by Iceberg, and in the HMS no partitions are 
stored.
This will remove a serious part of the load from the HMS and will allow higher 
number of partitions for a single table.

Also there is an ongoing work at the Impala project to read/write these 
Hive-Iceberg tables.

https://blog.cloudera.com/introducing-apache-iceberg-in-cloudera-data-platform/ 
<https://blog.cloudera.com/introducing-apache-iceberg-in-cloudera-data-platform/>

I hope this helps,
Peter


> On 2022. Apr 2., at 23:33, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> 
> While not active in the development community as much I have been using hive 
> in the field as well as spark and impala for some time.
> 
> My ancmecdotal opinion is that the current metastore needs a significant re 
> write to deal with "next generation" workloads. By next generation I actually 
> mean last generation. 
> 
> Currently cloudera's impala advice is . No more then 1k rows in table. And 
> tables with lots of partitions are problematic.
> 
> Thus really "wont get it done" at the "new" web scale. Hive server can have 
> memory problems with tables with 2k columns and 5k partitions. 
> 
> It feels like design ideas like "surely we can fetch all the columns of a 
> table in one go' dont make sense universally.
> 
> Amazon has glue which can scale to amazon scale. Hive metastore cant even 
> really scale to q single organization. So what are the next steps,  I dont 
> think its simple as "move it to nosql" I think it has to be reworked from 
> ground up.
> 
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than 
> usual.

Reply via email to