Re: [ANNOUNCE] New committer: Honah J.

2024-01-14 Thread OpenInx
Congrats, Honah ! On Sun, Jan 14, 2024 at 1:25 AM Jun H. wrote: > Congratulations! > > On Jan 12, 2024, at 10:12 PM, Péter Váry > wrote: > >  > Congratulations! > > On Sat, Jan 13, 2024, 06:26 Jean-Baptiste Onofré wrote: > >> Congrats ! >> >> Regards >> JB >> >> Le ven. 12 janv. 2024 à

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread OpenInx
PARQUET_ANNOTATE_STRINGS_UTF8 > for themselves. > > Approach C: Yeah, if Approach A goes through then we don't really need to > bother with this. > > Cheers, > Zoltan > > > On Wed, Jan 3, 2024 at 2:02 PM OpenInx wrote: > >> Thanks Zoltan and Ryan for your f

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread OpenInx
G >> type to actual string data. This approach does not fix already written >> files, as you already pointed out. >> >> Approach C: Migration job could copy data files but rewrite file >> metadata, if needed. This makes migration slower, but it's probably still >> faster th

Spark cannot read iceberg tables which were originally written by Impala

2023-12-25 Thread OpenInx
Hi dev Sensordata [1] had encountered an interesting Apache Impala & Iceberg bug in their real customer production environment. Their customers use Apache Impala to create a large mount of Apache Hive tables in HMS, and ingested PB-level dataset in their hive table (which were originally written

Re: RFC: Control flink upsert sink’s memory usage of insertedRowMap

2023-12-10 Thread OpenInx
https://github.com/apache/iceberg/pull/2680/files On Mon, Dec 11, 2023 at 11:15 AM OpenInx wrote: > Just provided a little context: there was a stale PR which was trying to > maintain the insertedRowMap into RocksDB.. > > On Sat, Dec 9, 2023 at 1:52 AM Ryan Blue wrote: > &g

Re: RFC: Control flink upsert sink’s memory usage of insertedRowMap

2023-12-10 Thread OpenInx
Just provided a little context: there was a stale PR which was trying to maintain the insertedRowMap into RocksDB.. On Sat, Dec 9, 2023 at 1:52 AM Ryan Blue wrote: > Thanks, Renjie! > > The option to use Flink's state tracking system seems like a good idea to > me. > > On Thu, Dec 7, 2023 at

Re: Welcome new PMC members!

2023-04-11 Thread OpenInx
Congrats ! On Wed, Apr 12, 2023 at 10:25 AM Junjie Chen wrote: > Congratulations to all of you! > > On Wed, Apr 12, 2023 at 10:07 AM Reo Lei wrote: > >> Congratulations!!! >> >> yuxia 于2023年4月12日周三 09:19写道: >> >>> Congratulations to all! >>> >>> Best regards, >>> Yuxia >>> >>>

Re: In Remembrance of Kyle

2022-12-07 Thread OpenInx
So sad to get this news...I lost such a great, kind, passionate friend. On Thu, Dec 8, 2022 at 1:36 AM Ryan Blue wrote: > I'm going to miss Kyle and I'm sad to lose him. > > He was amazing at making everyone feel welcome here. I think he commented > on nearly every pull request for the last

Re: [VOTE] Release Apache Iceberg 1.1.0 RC4

2022-11-27 Thread OpenInx
+1 (binding) 1. Download the source tarball, signature (.asc), and checksum (.sha512): OK 2. Import gpg keys: download KEYS and run gpg --import /path/to/downloaded/KEYS.txt (optional if this hasn’t changed) : OK 3. Verify the signature by running: gpg --verify apache-iceberg-1.1.0.tar.gz.asc :

Re: [Discuss]- Donate Iceberg Flink Connector

2022-11-08 Thread OpenInx
Hi Sorry for the late reply. I'm one of the core flink iceberg connector maintainers at the early stage (flink 1.12, flink 1.13, flink 1.14). For the later flink releases, I've had some adjustments in my work and had less interactions with apache flink+iceberg, thanks Ryan, Steven, Kyle,

Re: [VOTE] Release Apache Iceberg 0.14.1 RC3

2022-09-05 Thread OpenInx
+1 (binding). 1. Download the source tarball, signature (.asc), and checksum (.sha512): OK 2. Import gpg keys: download KEYS and run (optional if this hasn’t changed) ```bash $ gpg --import /path/to/downloaded/KEYS ``` It's OK 3. Verify the signature by running: ```bash $ gpg --verify

Re: 【Feature】Request support for c++ sdk

2022-06-13 Thread OpenInx
n general I think this is an exciting opportunity, and results have > shown time and time again that native readers / writers are much more > performant. > > +1 to using Rust as well (which is a language I know more of than C++ > these days - though both I'd have to brush off my sk

Re: 【Feature】Request support for c++ sdk

2022-06-12 Thread OpenInx
e taking charge of data maintenance. We don't have to > rewrite every corner of Iceberg in Rust. That means less engineering work. > > On 2022/06/08 10:16:05 OpenInx wrote: > > As a cloud-native table format standard for the big-data ecosystem, I > > believe supporting multiple

Re: 【Feature】Request support for c++ sdk

2022-06-08 Thread OpenInx
As a cloud-native table format standard for the big-data ecosystem, I believe supporting multiple languages is the correct direction so that different languages can connect to the apache iceberg table format. But I can also get Kyle's point about lacking enough resources(developers and reviewers

Re: Iceberg Delete Compaction Interface Design

2022-04-20 Thread OpenInx
Hi Yufei There was a proposed PR for this : https://github.com/apache/iceberg/pull/4522 On Thu, Apr 21, 2022 at 5:42 AM Yufei Gu wrote: > Hi team, > > Do we have a PR for this type of delete compaction? > >> Merge: the changes specified in delete files are applied to data files >> and then

Re: Welcome Szehon Ho as a committer!

2022-03-11 Thread OpenInx
Congrats Szehon! On Sat, Mar 12, 2022 at 7:55 AM Steve Zhang wrote: > Congratulations Szehon, Well done! > > Thanks, > Steve Zhang > > > > On Mar 11, 2022, at 3:51 PM, Jack Ye wrote: > > Congratulations Szehon!! > > -Jack > > On Fri, Mar 11, 2022 at 3:45 PM Wing Yew Poon > wrote: > >>

Review Request

2022-03-09 Thread OpenInx
Hi iceberg dev I've recently revisited the flink write path to use the newly introduced writers (which is partition specific writers). All the future performance & stability optimization will be made on top of the revisited flink write path. I've just published the PR here:

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-07 Thread OpenInx
t;> ORC-1123 Add `estimationMemory` method for writer >> >> According to the Apache ORC milestone, it will be released on May 15th. >> >> https://github.com/apache/orc/milestones >> >> Bests, >> Dongjoon. >> >> On 2022/03/04 13:1

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-03 Thread OpenInx
te: > Thanks to openinx for opening this discussion. > > One thing to note, the current approach faces a problem, because of some > optimization mechanisms, when writing a large amount of duplicate data, > there will be some deviation between the estimated and the actual size.

[DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-03 Thread OpenInx
Hi Iceberg dev As we all know, in our current apache iceberg write path, the ORC file writer cannot just roll over to a new file once its byte size reaches the expected threshold. The core reason that we don't support this before is: The lack of correct approach to estimate the byte size

Re: Review request

2022-03-02 Thread OpenInx
Thanks Peter for the great work. Just added my comments. On Wed, Mar 2, 2022 at 4:20 PM Peter Vary wrote: > Hi Team, > > I have a PR (https://github.com/apache/iceberg/pull/4218) waiting for > review where with basically a 1 liner change we can improve the performance > of the GenericReader

Re: [DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

2022-02-24 Thread OpenInx
ention. >>>> >>>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye wrote: >>>> >>>>> I think option 2 is ideal, but I don't know if there is any hard >>>>> requirement from ASF/Maven Central side for us to keep backwards >>>>> compatib

[DISCUSS] Align the spark runtime artifact names among spark2.4, spark3.0, spark3.1 and spark3.2

2022-02-20 Thread OpenInx
Hi everyone The current spark2.4, spark3.0 have the following unaligned runtime artifact names: # Spark 2.4 iceberg-spark-runtime-0.13.1.jar # Spark 3.0 iceberg-spark3-runtime-0.13.1.jar # Spark 3.1 iceberg-spark-runtime-3.1_2.12-0.13.1.jar # Spark 3.2 iceberg-spark-runtime-3.2_2.12-0.13.1.jar

Re: [DISCUSS] Iceberg roadmap

2022-02-17 Thread OpenInx
ntirely sure what that collaboration would look like just yet >>>> though. For most processing engines, it is people joining the Apache >>>> Iceberg community. No matter what the license of the downstream project, we >>>> always welcome more people contributing

Re: New Versioned Iceberg Documentation Site

2022-02-06 Thread OpenInx
The new site looks great to me, thanks all for the work ! One unrelated thing: I remember we had a discussion to bring a new page in the doc site to collect all the design docs (such as google doc, github issues etc), is there any progress for this thing ? Someone who connected to me has

Re: Vendor integration strategy

2021-12-13 Thread OpenInx
accessing Aliyun oss services. Thanks. On Tue, Dec 14, 2021 at 9:13 AM Jack Ye wrote: > Thank you Openinx for preparing all these PRs and the vote options! > > In the community sync, we also talked about not including any new vendor > integration modules in engine runtimes. In this appr

Re: Vendor integration strategy

2021-12-12 Thread OpenInx
://github.com/apache/iceberg/pull/3725 The usage example is here: https://github.com/apache/iceberg/pull/3725#issue-800973927 We can vote for option#1 or option#2. Any feedback is welcome, thanks in advance. On Thu, Dec 9, 2021 at 8:29 PM OpenInx wrote: > Thanks Jack for bringing this up, and tha

Re: Vendor integration strategy

2021-12-09 Thread OpenInx
>> future, and EMR will maintain their AWS SDK version upgrade independently. >> >> But the approach proposed by Aliyun seems to fit the use case of Aliyun >> users better. For more context, please read >> https://github.com/apache/iceberg/pull/3270 for the discussion betw

Re: Welcome new PMC members!

2021-11-17 Thread OpenInx
Congrats, Jack and Russell ! Well deserved ! On Thu, Nov 18, 2021 at 9:08 AM karuppayya wrote: > Congratulations Russell and Jack!! > > - Karuppayya > > On Wed, Nov 17, 2021 at 5:02 PM Yufei Gu wrote: > >> Congratulations, Jack and Russell! >> >> Best, >> >> Yufei >> >> `This is not a

Re: Upcoming Iceberg Community Sync (11/17 9:00am PT)

2021-11-16 Thread OpenInx
related PR to enhance the unit tests is: https://github.com/apache/iceberg/pull/3477 (Need someone to review & merge this). On Wed, Nov 17, 2021 at 10:03 AM OpenInx wrote: > Let me give more inputs from my perspective. > > 1. Fixed few critical flink v2 reader bugs: > a.

Re: Upcoming Iceberg Community Sync (11/17 9:00am PT)

2021-11-16 Thread OpenInx
Let me give more inputs from my perspective. 1. Fixed few critical flink v2 reader bugs: a. The flink avro reader bug: https://github.com/apache/iceberg/pull/3540 b. v2's extra meta columns messed up the flink's RowData pos: *

Re: [DISCUSS] Iceberg roadmap

2021-11-07 Thread OpenInx
Any thoughts for adding StarRocks integration to the roadmap ? I think the guys from StarRocks community can provide more background and inputs. On Thu, Nov 4, 2021 at 5:59 PM OpenInx wrote: > Update: > > StarRocks[1] is a next-gen sub-second MPP database for full analysis &g

Re: [VOTE] Release Apache Iceberg 0.12.1 RC0

2021-11-05 Thread OpenInx
validate license headers: dev/check-license: OK 7. Build and test the project: ./gradlew build (use Java 8) : OK 8. Check the flink works fine by the following command line: ./bin/sql-client.sh embedded -j /Users/openinx/Downloads/apache-iceberg-0.12.1/flink-runtime/build/libs/iceberg-flink

Re: [DISCUSS] Iceberg roadmap

2021-11-04 Thread OpenInx
ade project and marked the FLIP-27 project priority 1. > Thanks for all the work to get this done! > > On Sun, Oct 31, 2021 at 8:10 PM OpenInx wrote: > >> Update: >> >> I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap can be >> closed now, because all of t

Re: [DISCUSS] Iceberg roadmap

2021-10-31 Thread OpenInx
Update: I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap can be closed now, because all of the issues have been addressed. [1]. https://github.com/apache/iceberg/projects/12 On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner wrote: > I created a Roadmap section in

Re: Iceberg 0.12.1 Patch Release - Call for Bug Fixes and Patches

2021-10-27 Thread OpenInx
> > What does everyone else think? Should we wait for this Hive fix? > > On Wed, Oct 27, 2021 at 3:17 AM OpenInx wrote: > >> I think we will need to fix this critical iceberg bug before we release >> the 0.12.1: https://github.com/apache/iceberg/issues/3393 . Let's mark &

Re: Iceberg 0.12.1 Patch Release - Call for Bug Fixes and Patches

2021-10-27 Thread OpenInx
I think we will need to fix this critical iceberg bug before we release the 0.12.1: https://github.com/apache/iceberg/issues/3393 . Let's mark it as a blocker for the 0.12.1. On Fri, Oct 22, 2021 at 3:22 AM Kyle Bendickson wrote: > Thank you everybody for the additional PRs brought up so far. >

Re: Meeting Minutes from 10/20 Iceberg Sync

2021-10-22 Thread OpenInx
Thanks for the detailed report ! One more thing: We now have made a lot of progress in integrating Alibaba Cloud (https://www.aliyun.com/), Please see https://github.com/apache/iceberg/projects/21 (Thanks @xingbowu - https://github.com/xingbowu). On Thu, Oct 21, 2021 at 11:30 PM Sam Redai

Re: Snapshot tagging, branching and retention

2021-10-13 Thread OpenInx
Is it possible to maintain a meeting note for this and publish it to the mail list because I don't think everybody could attend this meeting ? Thanks. On Thu, Oct 14, 2021 at 2:00 AM Jack Ye wrote: > Hi everyone, > > Based on some offline discussions with different people around >

Re: Iceberg sync times

2021-10-09 Thread OpenInx
Thanks Ryan for bringing this up ! I attended several Iceberg syncs at 5 PM pacific time (9AM CST) and attended only one Iceberg sync at 9AM pacific time (1 AM CST), and have the following feelings: 1. We usually arrive at the office around 9:30AM to 10AM CST ( 5:30 PM ~ 6:00 PM pacific time).

Re: [DISCUSS] Spark version support strategy

2021-10-07 Thread OpenInx
desire to do it >> please reach out and coordinate with us! >> >> Ryan >> >> On Wed, Sep 29, 2021 at 9:12 PM Steven Wu wrote: >> >>> Wing, sorry, my earlier message probably misled you. I was speaking my >>> personal opinion on Flink version sup

Re: [DISCUSS] Spark version support strategy

2021-09-28 Thread OpenInx
t;>>>>>>> picking up new Iceberg features. >>>>>>>> >>>>>>>> Another way of thinking about this is that if we went with option >>>>>>>> 1, then we could port bug fixes into 0.12.x. But there ar

Re: can not use iceberg as a sql source in flink sql according to iceberg 0.12.0

2021-09-22 Thread OpenInx
Hi Joshua Can you check what's the parquet version you are using ? Looks like the line 112 in HadoopReadOptions is not the first line accessing the variables in ParquetInputFormat. [image: image.png] On Wed, Sep 22, 2021 at 11:07 PM Joshua Fan wrote: > Hi > I am glad to use iceberg as table

Re: [DISCUSS] Iceberg roadmap

2021-09-18 Thread OpenInx
Thanks Steven & Kyle. Yes, the flip-27 source and flink 1.13.2 are orthogonal because the flink's flip-27 API was successfully introduced in flink 1.12 release ( https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface). The WIP flip-27 iceberg source proposed from

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread OpenInx
Thanks for bringing this up, Anton. Everyone has great pros/cons to support their preferences. Before giving my preference, let me raise one question:what's the top priority thing for apache iceberg project at this point in time ? This question will help us to answer the following

Re: Iceberg community sync notes for 1 September 2021

2021-09-08 Thread OpenInx
, 2021 at 9:36 AM OpenInx wrote: > Thanks for the summary, Ryan ! > > I would like to add the following thing into the roadmap for 0.13.0: > > *Flink Integration* > > 1. Upgrade the flink version from 1.12.1 to 1.13.2 ( > https://github.com/apache/iceberg/pull/2629). &g

Re: Iceberg community sync notes for 1 September 2021

2021-09-08 Thread OpenInx
Thanks for the summary, Ryan ! I would like to add the following thing into the roadmap for 0.13.0: *Flink Integration* 1. Upgrade the flink version from 1.12.1 to 1.13.2 ( https://github.com/apache/iceberg/pull/2629). Because there is a bug in flink 1.12.1 when reading nested data types

Re: [VOTE] Adopt the v2 spec changes

2021-07-28 Thread OpenInx
> adopt the pending v2 spec changes as the supported v2 spec I assume this vote wants to reach the consistency between the community members that we won't introduce any breaking changes in v2 spec, not discuss exposing v2 to SQL tables like the following, right ? CREATE TABLE prod.db.sample (

Re: Welcoming Jack Ye as a new committer!

2021-07-06 Thread OpenInx
Congrats, Jack ! On Wed, Jul 7, 2021 at 7:40 AM Miao Wang wrote: > Congratulations! > > Miao > > Sent from my iPhone > > On Jul 5, 2021, at 4:14 PM, Daniel Weeks wrote: > >  > Great work Jack, Congratulations! > > On Mon, Jul 5, 2021 at 1:21 PM karuppayya > wrote: > >> Congratulations Jack!

Re: Welcoming OpenInx as a new PMC member!

2021-06-29 Thread OpenInx
gcn.bj wrote: > >> Congrats! >> >> 原始邮件 >> *发件人:* Dongjoon Hyun >> *收件人:* dev >> *发送时间:* 2021年6月30日(周三) 10:05 >> *主题:* Re: Welcoming OpenInx as a new PMC member! >> >> Congratulations! >> >> Dongjoon. >> >>

Re: Stableness of V2 Spec/API

2021-05-17 Thread OpenInx
wrote: > Thanks. Compaction is https://github.com/apache/iceberg/pull/2303 and it > is currently blocked by https://github.com/apache/iceberg/issues/2308? > > On Mon, May 17, 2021 at 6:17 PM OpenInx wrote: > >> Hi Huadong >> >> From the perspective of iceberg develope

Re: Stableness of V2 Spec/API

2021-05-17 Thread OpenInx
Hi Huadong >From the perspective of iceberg developers, we don't expose the format v2 to end users because we think there is still other work that needs to be done. As you can see there are still some unfinished issues from your link. As for whether v2 will cause data loss, from my perspective as

Re: how to test row level delete

2021-04-06 Thread OpenInx
w referenced in your > previous email. > > TableProperties.FORMAT_VERSION > > Can you suggest? I want to create a V2 table to test some row level > upserts/deletes. > > Chen > > On Sun, Dec 27, 2020 at 9:33 PM OpenInx wrote: > >> > you can apply this p

Re: When is the next release of Iceberg ?

2021-04-02 Thread OpenInx
Hi Himanshu If you want to try the flink + iceberg fo syncing mysql binlog to iceberg table, you might be interested in those PRs: 1. https://github.com/apache/iceberg/pull/2410 2. https://github.com/apache/iceberg/pull/2303 On Wed, Mar 24, 2021 at 10:34 AM OpenInx wrote: > Hi Himan

Re: Welcoming Ryan Murray as a new committer!

2021-03-29 Thread OpenInx
Congrats, Ryan ! Well-deserved ! On Tue, Mar 30, 2021 at 9:32 AM Junjie Chen wrote: > Congratulations. Ryan! > > On Tue, Mar 30, 2021 at 5:02 AM Daniel Weeks > wrote: > >> Congrats, Ryan and thanks for all the great work! >> >> On Mon, Mar 29, 2021 at 1:59 PM Ryan Blue >> wrote: >> >>>

Re: Welcoming Russell Spitzer as a new committer

2021-03-29 Thread OpenInx
Congrats, Russell ! Well-deserved ! On Tue, Mar 30, 2021 at 9:33 AM Junjie Chen wrote: > Congratulations, Russell! Nice work! > > On Tue, Mar 30, 2021 at 5:02 AM Daniel Weeks > wrote: > >> Congrats, Russell! >> >> On Mon, Mar 29, 2021 at 1:59 PM Ryan Blue >> wrote: >> >>> Congratulations,

Re: Welcoming Yan Yan as a new committer!

2021-03-23 Thread OpenInx
Congrats Yan ! You deserve it. On Wed, Mar 24, 2021 at 7:18 AM Miao Wang wrote: > Congrats @Yan Yan ! > > > > Miao > > > > *From: *Ryan Blue > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Tuesday, March 23, 2021 at 3:43 PM > *To: *Iceberg Dev List > *Subject: *Welcoming Yan Yan as a new

Re: When is the next release of Iceberg ?

2021-03-23 Thread OpenInx
Hi Himanshu Thanks for the email, currently we flink+iceberg support writing CDC events into apache iceberg table by flink datastream API, besides the spark/presto/hive could read those events in batch job. But there are still some issues that we do not finish yet: 1. Expose the iceberg v2 to

Sync: the progress of row-level delete

2021-03-14 Thread OpenInx
Hi iceberg dev: Currently, Junjie Chen and I have made some progress about the Rewrite Action for format v2. We will have two kinds of Rewrite Action: 1. The first one is rewriting equality delete rows into position delete rows. The PoC PR is here:

Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

2021-03-03 Thread OpenInx
It will be 1:00 AM (China Standard Time) on 18 March, and it works for our Asia people. I'd love to attend this discussion, Thanks. On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue wrote: > Thanks for putting this together, Guy! I just did a pass over the doc and > it looks like a really reasonable

Re: Sync to discuss secondary index proposal

2021-01-28 Thread OpenInx
Sorry I sent the wrong link, the secondary index document link is: https://docs.google.com/document/d/1E1ofBQoKRnX04bWT3utgyHQGaHZoelgXosk_UNsTUuQ/edit On Fri, Jan 29, 2021 at 10:31 AM OpenInx wrote: > Hi > > @Miao WangWould you mind to share your current PoC >

Re: Sync to discuss secondary index proposal

2021-01-28 Thread OpenInx
=601316b0# On Fri, Jan 29, 2021 at 10:16 AM 李响 wrote: > +1, my colleagues and I is at UTC+8 > > On Fri, Jan 29, 2021 at 9:50 AM OpenInx wrote: > >> +1, my time zone is CST. >> >> On Fri, Jan 29, 2021 at 6:57 AM Xinli shang >> wrote: >> >>> I had so

Re: [VOTE] Release Apache Iceberg 0.11.0 RC0

2021-01-25 Thread OpenInx
Hi dev I'd like to include this patch in release 0.11.0 because it's the document of new flink features. I'm sorry that I did not update the flink's document in time when the feature code merged, but I think it's worth it to merge this document PR when we release iceberg 0.11.0, that helps a

Re: Welcoming Peter Vary as a new committer!

2021-01-25 Thread OpenInx
Congratulations and welcome Peter ! On Tue, Jan 26, 2021 at 9:41 AM Junjie Chen wrote: > Congratulations! > > On Tue, Jan 26, 2021 at 8:26 AM Jun H. wrote: > >> Congratulations >> >> On Mon, Jan 25, 2021 at 4:18 PM Yan Yan wrote: >> > >> > Congratulations! >> > >> > On Mon, Jan 25, 2021 at

Re: test flakiness with SocketException of broken pipe in HiveMetaStoreClient

2021-01-08 Thread OpenInx
any extra usage about the table loader and forget to close it in your flip-27 dev branch ? [1]. https://github.com/apache/iceberg/blob/7645ceba65044184be192a7194a38729133b2e50/flink/src/main/java/org/apache/iceberg/flink/source/FlinkInputFormat.java#L77 On Fri, Jan 8, 2021 at 3:36 PM OpenInx wrote

Re: test flakiness with SocketException of broken pipe in HiveMetaStoreClient

2021-01-07 Thread OpenInx
apache/iceberg/pull/2051/files On Fri, Jan 8, 2021 at 5:48 AM Steven Wu wrote: > Ryan/OpenInx, thanks a lot for the pointers. > > I was able to almost 100% reproduce the HiveMetaStoreClient aborted > connection problem locally with Flink tests after adding > another DeleteReadTest

Re: test flakiness with SocketException of broken pipe in HiveMetaStoreClient

2021-01-06 Thread OpenInx
I encountered a similar issue when supporting hive-site.xml for flink hive catalog. Here is the discussion and solution before: https://github.com/apache/iceberg/pull/1586#discussion_r509453461 It's a connection leak issue. On Thu, Jan 7, 2021 at 10:06 AM Ryan Blue wrote: > I've noticed this

Re: how to generate a new .v1.metadata.json.crc for v1.metadata.json

2020-12-27 Thread OpenInx
You edited the v1.metadata.json to support iceberg format v2 ? That's not the correct way to use iceberg format v2. let's discuss this issue in the latest email . On Sat, Dec 26, 2020 at 7:01 PM 1 wrote: > Hi, all: > >I vim the v1.metadata.json, so old .v1.metadata.json.crc is not >

Re: how to test row level delete

2020-12-27 Thread OpenInx
> you can apply this patch in your own repository The patch is : https://github.com/apache/iceberg/pull/1978 On Mon, Dec 28, 2020 at 10:32 AM OpenInx wrote: > Hi liubo07199 > > Thanks for testing the iceberg row-level delete, I skimmed the code, it > seems you were trying the

Re: how to test row level delete

2020-12-27 Thread OpenInx
Hi liubo07199 Thanks for testing the iceberg row-level delete, I skimmed the code, it seems you were trying the equality-delete feature. For iceberg users, I think we don't have to write those iceberg internal codes to get this work, this isn't friendly for users. Instead, we usually use the

Re: What's the time to expose iceberg format v2 to end users ?

2020-12-18 Thread OpenInx
Thanks Yan for the document, I will take a look at it, and see what I can do. On Fri, Dec 18, 2020 at 3:38 AM Yan Yan wrote: > Hi OpenInx, > > Thanks for bringing this up. I am currently working on Format v2 blocking > tasks, and am maintaining a full list of bl

What's the time to expose iceberg format v2 to end users ?

2020-12-16 Thread OpenInx
Hi I wrote this email to align with the community about the time to expose format v2 to end users. In iceberg format v2, we've accomplished the row-level delete. It's designed for two user cases: 1. Execute a single query to update or delete lots of rows. It's a typical batch update/delete

Re: [VOTE] Release Apache Iceberg 0.10.0 RC4

2020-11-03 Thread OpenInx
+1 for 0.10.0 RC4 1. Download the source tarball, signature (.asc), and checksum (.sha512): OK 2. Import gpg keys: download KEYS and run gpg --import /path/to/downloaded/KEYS (optional if this hasn’t changed) : OK 3. Verify the signature by running: gpg --verify apache-iceberg-xx.tar.gz.asc:

Re: [VOTE] Release Apache Iceberg 0.10.0 RC2

2020-11-03 Thread OpenInx
. On Wed, Nov 4, 2020 at 1:31 AM Ryan Blue wrote: > OpenInx, is that a general question or is it related to the release? It > doesn't look related, but I want to make sure. > > On Tue, Nov 3, 2020 at 5:41 AM OpenInx wrote: > >> Hi >> >> I will suggest taking a l

Re: [VOTE] Release Apache Iceberg 0.10.0 RC2

2020-11-03 Thread OpenInx
gt; On Mon, Nov 2, 2020 at 2:28 PM Mass Dosage >>>> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> I ran the RC against a set of integration tests I have for a subset of >>>>> the Hive2 read functionality on

Re: Plans for the future iceberg 0.11.0 release

2020-11-01 Thread OpenInx
Thanks for your context about FLIP-27, Steven ! I will take a look for the patches under issues 1626. On Sat, Oct 31, 2020 at 2:03 AM Steven Wu wrote: > OpenInx, thanks a lot for kicking off the discussion. Looks like my > previous reply didn't reach the mailing list. > > > fli

Plans for the future iceberg 0.11.0 release

2020-10-28 Thread OpenInx
Hi dev As we know, we will be happy to cut the iceberg 0.10.0 candidate release this week. I think it may be the time to plan for the future iceberg 0.11.0 now, so I created a Java 0.11.0 Release milestone here [1] I put the following issues into the newly created milestone: 1. Apache Flink

Re: Several flink pull requests need to get merged before the next release 0.10.0

2020-10-27 Thread OpenInx
t; > On Mon, Oct 19, 2020 at 7:15 PM OpenInx wrote: > >> Hi >> >> As we know that we next release 0.10.0 is coming, there are several >> issues which should be merged as soon as possible in my mind: >> >> 1. https://github.com/apache/iceberg/pull/1477 &

Several flink pull requests need to get merged before the next release 0.10.0

2020-10-19 Thread OpenInx
Hi As we know that we next release 0.10.0 is coming, there are several issues which should be merged as soon as possible in my mind: 1. https://github.com/apache/iceberg/pull/1477 It will change the flink state design to maintain the complete data files into manifest before checkpoint finished,

Re: Incremental reads for Upsert!

2020-10-19 Thread OpenInx
t now, > all of the readers produce records from the current tables state. I think > @OpenInx and @Jingsong Li have > some plans to expose such a reader for Flink, though. Maybe they can work > with you to on some milestones and a roadmap. > > rb > > On Fri, Oct 16, 20

Re: Iceberg V2 Spec

2020-09-20 Thread OpenInx
t; a few update emails related, but it only covers one part. >> >> Chen >> >> On Thu, Jul 2, 2020 at 9:53 PM OpenInx wrote: >> >>> Sounds good to me. >>> >>> Thanks. >>> >>> On Fri, Jul 3, 2020 at 12:58 AM Ryan Blue wrote:

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread OpenInx
I agree that it's helpful to allow users to read the incremental delta based timestamp, as Jingsong said timestamp is more friendly. My question is how to implement this ? If just attach the client's timestamp to the iceberg table when committing, then different clients may have different

Re: [DISCUSS] August board report

2020-08-13 Thread OpenInx
ecordings > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > > On Wed, Aug 12, 2020 at 7:07 PM OpenInx wrote: > >> > Community members gave 2 Iceberg talks at Subsurface Conf, on enabling >> Hive >> queries against Iceberg tables and working with petabyte-sca

Re: [DISCUSS] August board report

2020-08-12 Thread OpenInx
> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive queries against Iceberg tables and working with petabyte-scale Iceberg tables. Iceberg was also mentioned in the keynotes. Are there slides or videos about the two iceberg talks ? I'd like to read/watch slides or videos

Re: [DISCUSS] 0.9.1 release

2020-08-03 Thread OpenInx
> Does anyone know if we can recover existing data affected by it? In the PR #1271, there are two data types which have correctness bugs: decimal18 and timestampZone. For decimal18, we actually write the correct decimal value, but read it in an incorrect way. saying the decimal(10,3) and value

Re: Iceberg community sync notes - 29 July 2020

2020-08-02 Thread OpenInx
ed the flink DataStream iceberg sink, we will create PRs to make the flink table sql work. 3. Flink streaming reader / batch reader etc. > Kyle: I’ll be interested to review. Thanks for your time to review those PR. > It seems like points raised by @openinx in the CDC pipelines doc must

Re: New committer: Shardul Mahadik

2020-07-22 Thread OpenInx
Congratulations ! On Thu, Jul 23, 2020 at 9:31 AM Jingsong Li wrote: > Congratulations Shardul! Well deserved! > > Best, > Jingsong > > On Thu, Jul 23, 2020 at 7:27 AM Anton Okolnychyi > wrote: > >> Congrats and welcome! Keep up the good work! >> >> - Anton >> >> On 22 Jul 2020, at 16:02, RD

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-09 Thread OpenInx
I followed the verify guide here ( https://lists.apache.org/thread.html/rd5e6b1656ac80252a9a7d473b36b6227da91d07d86d4ba4bee10df66%40%3Cdev.iceberg.apache.org%3E) : 1. Verify the signature: OK 2. Verify the checksum: OK 3. Untar the archive tarball: OK 4. Run RAT checks to validate license

Re: Iceberg V2 Spec

2020-07-02 Thread OpenInx
d do for Spark 3 support. > > Does that sound reasonable? > > On Wed, Jul 1, 2020 at 7:39 PM OpenInx wrote: > >> Hi Ryan: >> >> Just curious when do we plan to release 0.9.0 ? I expect that the flink >> connector could be included in release 0.9.0. >>

Re: Iceberg V2 Spec

2020-07-01 Thread OpenInx
Hi Ryan: Just curious when do we plan to release 0.9.0 ? I expect that the flink connector could be included in release 0.9.0. Thanks. On Thu, Jul 2, 2020 at 12:14 AM Ryan Blue wrote: > Hi Chen, > > Right now, the main parts of the v2 spec are the addition of sequence > numbers and delete

[Doc] Streaming CDC in Iceberg

2020-06-28 Thread OpenInx
Hi dev: We have a discussion about the equality-deletes here [1]. It seems more complex when considering the CDC events streaming to the iceberg table, so I prepared a document for further discussion here [2]. Any suggestions and feedback are welcome, thanks. [1].

Re: [DISCUSS] Changes for row-level deletes

2020-05-05 Thread OpenInx
1]. https://github.com/generic-datalake/iceberg [2]. https://github.com/generic-datalake/iceberg/tree/master/flink/src On Wed, May 6, 2020 at 11:44 AM OpenInx wrote: > The two-phrase approach sounds good to me. the precondition is we have > limited number of delete files so that memory can hold

Re: [DISCUSS] Changes for row-level deletes

2020-05-05 Thread OpenInx
The two-phrase approach sounds good to me. the precondition is we have limited number of delete files so that memory can hold all of them, we will have the compaction service to reduce the delete files so it seems not a problem.

Re: Iceberg community sync notes - 15 April 2020

2020-04-16 Thread OpenInx
Thanks for the writing. The views from Netflix branch is a great feature, would have any plan to port to Apache Iceberg ? On Fri, Apr 17, 2020 at 5:31 AM Ryan Blue wrote: > Here are my notes from yesterday’s sync. As usual, feel free to add to > this if I missed something. > > There were a

Re: Open a new branch for row-delete feature ?

2020-04-02 Thread OpenInx
Maybe we will choose to add a type column >like this, but I’d like to have a design in mind before we merge these PRs. >Thinking through this and coming up with a proposal here is the next >priority for this work, because it will unlock more tasks we can do in >parallel

Re: Open a new branch for row-delete feature ?

2020-03-31 Thread OpenInx
> > > *From: *Ryan Blue > *Reply-To: *"dev@iceberg.apache.org" , " > rb...@netflix.com" > *Date: *Tuesday, March 31, 2020 at 10:08 AM > *To: *OpenInx > *Cc: *Iceberg Dev List > *Subject: *Re: Open a new branch for row-delete feature ? > > > > I'

Re: Open a new branch for row-delete feature ?

2020-03-30 Thread OpenInx
ters for >> diff formats -- can be done in master. >> >> rb >> >> On Mon, Mar 30, 2020 at 9:00 AM Gautam wrote: >> >>> Thanks for bringing this up OpenInx. That's a great idea: to open a >>> separate branch for row-level deletes. >>> >

Re: Iceberg community sync - 2020-03-25

2020-03-28 Thread OpenInx
he has the bandwidth. There're some flink committers and PMC in our flink team, we could also ping them. > Openinx brought up concerns about minimizing end-to-end latency Agreed that we could implement the file/pos deletes and equality-deletes firstly. The off-line optimization seems reasonable, we

Open a new branch for row-delete feature ?

2020-03-27 Thread OpenInx
Dear Dev: Tuesday, we had a sync meeting. and discussed about the things: 1. cut the 0.8.0 release; 2. flink connector ; 3. iceberg row-level delete; 4. Map-Reduce Formats and Hive support. We'll release version 0.8.0 around April 15, the

Re: What have I learned from doing Merge-On-Read PoC

2020-03-23 Thread OpenInx
...chenjunjiedada:row-level-delete#diff-c168df8c9739650eab655b22b0b549acR407 [4]. https://github.com/apache/incubator-iceberg/compare/master...chenjunjiedada:row-level-delete#diff-fffa37e29d3736de086cbd23094865b7R63 On Sun, Mar 22, 2020 at 8:49 PM Junjie Chen wrote: > Great job and nice document

  1   2   >