Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread Zoltán Borók-Nagy
also produced correct Parquet files, but that's beyond our control and > > there's, no doubt, a ton of data already in that format. > > > > This could also be part of our v3 work, where I think we intend to add > > binary to string type promotion to the format. > > >

Re: Spark cannot read iceberg tables which were originally written by Impala

2023-12-26 Thread Zoltán Borók-Nagy
Hey Everyone, Thank you for raising this issue and reaching out to the Impala community. Let me clarify that the problem only happens when there is a legacy Hive table written by Impala, which is then converted to Iceberg. When Impala writes into an Iceberg table there is no problem with

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-27 Thread Zoltán Borók-Nagy
avac is not an >>> optimizing compiler and there should not be much difference in performance >>> of the jars produced by different compilers, these changes might be worth >>> for the project to declare a newer compile-time JDK across all modules, and >>>

Re: Support create table like for Iceberg table?

2023-04-26 Thread Zoltán Borók-Nagy
As a reference, Impala can also do Hive-style CREATE TABLE x LIKE y for Iceberg tables. You can see various examples at https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/iceberg-create-table-like-table.test - Zoltan On Wed, Apr 26, 2023 at 4:10 AM

Re: [DISCUSS] Switch to JDK 11 for releases?

2023-04-24 Thread Zoltán Borók-Nagy
Besides Hive, neither Impala is compatible with Java11 right now. This work is in-progress: https://issues.apache.org/jira/browse/IMPALA-11360 - Zoltan On Mon, Apr 24, 2023 at 11:07 AM Mass Dosage wrote: > I agree with Ryan, unless you can change the source version there's not > that much

Re: C++/Rust SDK sync

2023-04-12 Thread Zoltán Borók-Nagy
Hi, I am also interested in the discussion, all those times work for me. Cheers, Zoltan On Wed, Apr 12, 2023 at 4:17 AM Chao Sun wrote: > We are also interested in this discussion. Internally, we have been > working on something similar in Rust, so it'd be great if we can > combine the

Re: Temporal Iceberg Service

2022-09-01 Thread Zoltán Borók-Nagy
Hi Taher, I think most of your questions are answered in the Scan Planning section at the Iceberg spec page: https://iceberg.apache.org/spec/#scan-planning To give you some specific answers as well: Equality Deletes: data and delete files have sequence numbers from which readers can infer the

Impala reading V2 tables design doc

2022-07-08 Thread Zoltán Borók-Nagy
Hi Iceberg/Impala Team, We've been working on adding read support for Iceberg V2 tables in Impala. In the first round we're focusing on position deletes. We are thinking about different approaches so I've written a design doc about it:

Re: Matching iceberg data types to Parquet data types

2021-08-27 Thread Zoltán Borók-Nagy
Hi, You can find information of type mappings here: https://iceberg.apache.org/spec/#parquet 1. Iceberg timestamps have microseconds precision. In Parquet they are stored as INT64s with TIMESTAMP_MICROS annotation. 2. Iceberg limits decimal precision to 38:

Re: question about the iceberg manifest/manifest list/metadata api

2021-06-08 Thread Zoltán Borók-Nagy
5ca1dc236a340a5d9d3031.jpg=%5B%22%E9%82%AE%E7%AE%B1yong.sunny%40163.com+from+phone%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 05/27/2021 16:54, Zoltán Borók-Nagy wrote: > Hi Yong Yang, > > It is supported by Iceberg, and this is

Re: question about the iceberg manifest/manifest list/metadata api

2021-05-27 Thread Zoltán Borók-Nagy
Hi Yong Yang, It is supported by Iceberg, and this is exactly how Impala is working. I.e. Impala's Parquet writer writes the data files, then we use Iceberg's API to append them to the table. You can find the relevant code here:

Re: Dynamic INSERT OVERWRITE

2021-01-30 Thread Zoltán Borók-Nagy
ou want to overwrite a day, you pass a filter for that day. > Another way around this problem is to support MERGE INTO, which will detect > the files that need to be changed and correctly rewrite them, wherever they > are in the table. > > rb > > On Fri, Jan 29, 2021 at 10:14 AM

Dynamic INSERT OVERWRITE

2021-01-29 Thread Zoltán Borók-Nagy
Hey everyone, I'm currently working on the INSERT OVERWRITE statement for Iceberg tables in Impala. Seems like ReplacePartitions is the perfect interface for this job: https://github.infra.cloudera.com/CDH/iceberg/blob/cdpd-master/api/src/main/java/org/apache/iceberg/ReplacePartitions.java IIUC

Re: Welcoming Peter Vary as a new committer!

2021-01-26 Thread Zoltán Borók-Nagy
Congrats, Peter! On Tue, Jan 26, 2021 at 5:47 AM ForwardXu wrote: > Congratulations Peter! > > > -- 原始邮件 -- > *发件人:* "dev" ; > *发送时间:* 2021年1月26日(星期二) 凌晨4:25 > *收件人:* "dev"; > *主题:* Re: Welcoming Peter Vary as a new committer! > > Congratulations! > > Op ma 25

Re: Iceberg/Hive properties handling

2020-12-01 Thread Zoltán Borók-Nagy
pass table properties from Hive or Impala. If we exclude a prefix or >specific properties, then everything but the properties reserved for >locating the table are passed as the user would expect. > > I don't have a strong opinion about this, but yeah, maybe this behavior would cause t

Re: Iceberg/Hive properties handling

2020-11-30 Thread Zoltán Borók-Nagy
Thanks, Peter. I answered inline. On Mon, Nov 30, 2020 at 3:13 PM Peter Vary wrote: > Hi Zoltan, > > Answers below: > > On Nov 30, 2020, at 14:19, Zoltán Borók-Nagy < > borokna...@cloudera.com.INVALID> wrote: > > Hi, > > Thanks for the replies. My take fo

Re: Iceberg/Hive properties handling

2020-11-30 Thread Zoltán Borók-Nagy
" properties to SERDEPROPERTIES? >- Shall we define a prefix for setting Iceberg table properties from >Hive queries and omitting other engine specific properties? > > > Thanks, > Peter > > > On Nov 27, 2020, at 17:45, Mass Dosage wrote: > > I like

Re: Iceberg/Hive properties handling

2020-11-26 Thread Zoltán Borók-Nagy
Hi, The above aligns with what we did in Impala, i.e. we store information about table loading in HMS table properties. We are just a bit more explicit about which catalog to use. We have table property 'iceberg.catalog' to determine the catalog type, right now the supported values are

Re: Iceberg - Hive schema synchronization

2020-11-25 Thread Zoltán Borók-Nagy
Hi Everyone, In Impala we face the same challenges. I think a strict 1-to-1 type mapping would be beneficial because that way we could derive the Iceberg schema from the Hive schema, not just the other way around. So we could just naturally create Iceberg tables via DDL. We should use the same

INSERT to Iceberg tables from Impala

2020-09-11 Thread Zoltán Borók-Nagy
Hi, I'm willing to add INSERT support for Iceberg tables in Impala. For start I created the following design doc: https://docs.google.com/document/d/1_KL0YptDKwhiXvJyx4Vb-yZjggrPQAW2yjeGV4C0vMU/edit?usp=sharing All comments are welcome. Thanks, Zoltan