Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-19 Thread Owen O'Malley
ager responsibility, since each > sub-project is released independently. It was not intended to be tied to > committer responsibilities. > > Best, > Jack Ye > > [1] https://community.apache.org/newcommitter.html > [2] https://www.apache.org/foundation/voting > > On Fri, Jul 19, 2024 a

Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-19 Thread Owen O'Malley
Everyone is welcome to vote. The Iceberg PMC will have the only binding votes. .. OwenOn Jul 19, 2024, at 10:19, Wing Yew Poon wrote:Hi Owen,Thanks for doing this.Once you have the questions and choices, who gets to vote on them?- Wing YewOn Fri, Jul 19, 2024 at 10:07 AM Owen O'Malley <o

Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-19 Thread Owen O'Malley
formalize those boundaries adds unnecessary organizational complexity. .. Owen On Fri, Jul 19, 2024 at 10:06 AM Owen O'Malley wrote: > All, >Sorry for the long pause on bylaws discussion. It was a result of > wanting to avoid the long US holiday week (July 4th) and my > procras

[DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-19 Thread Owen O'Malley
All, Sorry for the long pause on bylaws discussion. It was a result of wanting to avoid the long US holiday week (July 4th) and my procrastination, which was furthered by a side conversation that asked me to consider how to move forward in an Apache way. I'd like to thank Jack for moving this

Re: [Discussion] Apache Iceberg Community Guideline - Initial Version

2024-07-01 Thread Owen O'Malley
Sorry for coming into this conversation late, but I have a lot of experience with writing the bylaws for Apache projects (Hadoop & ORC). As a neutral third party (not working for Databricks or a cloud provider) who has a lot of Apache experience, I'd like to offer my service as a moderator for the

Re: [DISCUSS] June board report

2024-06-15 Thread Owen O'Malley
Ryan, It looks good. Thanks for including the notice about Tabular/Databricks. .. Owen On Wed, Jun 12, 2024 at 9:52 PM Ryan Blue wrote: > Hi everyone, > > Here's my current draft board report for June. If you have anything to add > or update, please reply and I'll amend the report. > > Thank

Re: Call for Ryan Blue to Step Down as PMC Chair

2024-06-05 Thread Owen O'Malley
I strongly disagree with asking Ryan to step down. For those who don't know me, I'm an Iceberg PMC member, Apache member, and was a mentor and champion for Iceberg when it entered the Apache Incubator . I've never worked at eith

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-09-22 Thread Owen O'Malley
It is also important to consider who is on the program committee and their affiliations. It also helps if the pc discourages sales talks (especially with propriety extensions!) They should encourage  technical ones about development and usage of the Apache project. .. OwenOn Sep 22, 2023, at 11:19,

ApacheCon Iceberg BOF

2022-10-05 Thread Owen O'Malley
All, There is an Iceberg Birds of a Feather meet up at ApacheCon in an hour (5:50pm CDT). Please come by and join us, if you are attending. Thanks, Owen

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-04 Thread Owen O'Malley
At the stripe boundaries, the bytes on disk statistics are accurate. A stripe that is in flight, is going to be an estimate, because the dictionaries can't be compressed until the stripe is flushed. The memory usage will be a significant over estimate, because it includes buffers that are allocated

Re: Hive table compatibility for Iceberg readers

2022-01-31 Thread Owen O'Malley
On Thu, Jan 27, 2022 at 10:26 PM Walaa Eldin Moustafa wrote: > *2. Iceberg schema lower casing:* Before Iceberg, when users read Hive > tables from Spark, the returned schema is lowercase since Hive stores all > metadata in lowercase mode. If users move to Iceberg, such readers could > break once

Re: [CWS] Re: Subject: [VOTE] Release Apache Iceberg 0.12.0 RC3

2021-08-16 Thread Owen O'Malley
Ok, after the vote, but I did: * verified tag is same as the tar ball * verified checksums and signatures * built and ran the tests My one complaint is that I get test failures that look like they are timezone related. ORC and Parquet tests failing with timestamps 7 or 8 hours off. .. Owen On Su

Re: rowGroup:File = 1:1

2021-07-08 Thread Owen O'Malley
As Ryan & Dan said, the trade offs are roughly: bigger parquet row groups & orc stripes: * better compression * fewer read operations * lower file metadata overhead * fewer files to manage smaller row groups/stripes: * better parallelism * lower memory usage Some of the worst performing tables t

Re: Default TimeZone for unit tests

2021-03-01 Thread Owen O'Malley
In ORC, the timezone tests vary the default timezone through multiple values using the Java APIs. (They do restore the initial value when the test exits.) :) .. Owen On Mon, Mar 1, 2021 at 9:25 PM Edgar Rodriguez wrote: > Hi folks, > > Thanks Peter for the quick fix! > > I do think it'd be a go

Type attributes

2021-01-04 Thread Owen O'Malley
One of the challenges that we have at LinkedIn is that we have a *lot* of Avro schemas. I'd like to be able to represent those Avro schemas using Iceberg's types and there are a few challenges: - unions - enums - default values One way out of those problems without extending the Iceberg

Re: Iceberg - Hive schema synchronization

2020-11-24 Thread Owen O'Malley
You left the complex types off of your list (struct, map, array, uniontype). All of them have natural mappings in Iceberg, except for uniontype. Interval is supported on output, but not as a column type. Unfortunately, we have some tables with uniontype, so we'll need a solution for how to deal wit

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
sql use cases have fewer pushdown predicates, having a translation on that side seems less error-prone. .. Owen On Fri, Sep 18, 2020 at 10:54 PM Ryan Blue wrote: > Are you saying that we can't fix this by rewriting expressions to > translate from SQL to more natural semantics? &

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
add tests for these cases and rewrite expressions > to account for the difference. Iceberg should push notEqual("col", "x") > to ORC as SQL (col != 'x' or col is null). Presto can similarly translate col > != 'x' to and(notEqual("col&qu

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
I think that we should follow the SQL semantics to prevent surprises when SQL engines integrate with Iceberg. .. Owen On Thu, Sep 17, 2020 at 9:08 PM Shardul Mahadik wrote: > Hi all, > > I noticed that Iceberg's predicates are not compatible with SQL predicates > when it comes to handling NULL

Re: Iceberg sync notes - 9 September 2020

2020-09-14 Thread Owen O'Malley
As I mentioned in the meetup, ORC 1.6.4 was pending and has been released. It should be available on Maven central tomorrow. .. Owen On Mon, Sep 14, 2020 at 10:38 PM Ryan Blue wrote: > Hi everyone, > > I just update the Iceberg sync doc >

Re: [DISCUSS] August board report

2020-08-12 Thread Owen O'Malley
+1 looks good. On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue wrote: > Hi everyone, > > Here's a draft of the board report for this month. Please reply with > anything that you'd like to see added or that I've missed. Thanks! > > rb > > ## Description: > Apache Iceberg is a table format for huge anal

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-13 Thread Owen O'Malley
On Mon, Jul 13, 2020 at 4:28 PM Anton Okolnychyi wrote: > I think the issue that was brought up by Dongjoon is valid and we should > document the current caching behavior. > The problem is also more generic and does not apply only to views as > operations that are happening through the source dir

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-13 Thread Owen O'Malley
+1 (binding) - Verified signatures - Verified checksum - Built src from tarball and ran tests. - Looked at JMH dependency to make sure it wasn't leaking into the published artifacts. .. Owen On Mon, Jul 13, 2020 at 11:00 AM RD wrote: > +1 > - verified signatures and checksum > -

Re: [VOTE] Graduate to a top-level project

2020-05-12 Thread Owen O'Malley
ed to serve as the initial members of the Apache Iceberg Project: > > * Anton Okolnychyi > * Carl Steinbach > * Daniel C. Weeks > * James R. Taylor > * Julien Le Dem > * Owen O'Malley > * Parth Brahmbhatt > * Ratandeep Ratti > * Ryan Blue

Re: [DISCUSS] Graduating from the Apache Incubator

2020-05-11 Thread Owen O'Malley
pe of responsibility of the Apache Iceberg > Project; and be it further > > RESOLVED, that the persons listed immediately below be and hereby are > appointed to serve as the initial members of the Apache Iceberg Project: > > * Anton Okolnychyi > * Carl Steinbach >

Re: [VOTE] Release Apache Iceberg 0.8.0-incubating RC2

2020-04-30 Thread Owen O'Malley
+1 1. Checked signature and checksum 2. Built and ran unit tests. 3. Checked ORC version :) On Monday, ORC released 1.6.3, so we should grab those fixes soon. .. Owen On Thu, Apr 30, 2020 at 12:34 PM Dongjoon Hyun wrote: > +1. > > 1. Verified checksum, sig, and license > 3. Build fro

Re: [DISCUSS] September report

2019-09-06 Thread Owen O'Malley
On Fri, Sep 6, 2019 at 12:19 AM Justin Mclean wrote: > So why does the project think it's ready to graduate? Mentors do you think > the project is ready to graduate? > It has to make a release or two, but I agree with Ryan that it approaching graduation. The project entered Apache with five Apac

Re: [DISCUSS] September report

2019-09-06 Thread Owen O'Malley
On Wed, Sep 4, 2019 at 4:55 PM Ryan Blue wrote: > Hi everyone, > > Here's a draft of this month's report to the IPMC. Please reply with > comments if you'd like to add anything! > > rb > > ## Iceberg > > Iceberg is a table format for large, slow-moving tabular data. > > Iceberg has been incubatin

Re: [DISCUSS] Implementation strategies for supporting Iceberg tables in Hive

2019-08-07 Thread Owen O'Malley
> On Jul 24, 2019, at 22:52, Adrien Guillo > wrote: > > Hi Iceberg folks, > > In the last few months, we (the data infrastructure team at Airbnb) have been > closely following the project. We are currently evaluating potential > strategies to migrate our data warehouse to Iceberg. However,

Re: Sort Spec

2019-07-18 Thread Owen O'Malley
I would say yes >>>> 2) Should Iceberg allow users to define a sort spec only if the table >>>> is bucketed? >>>> - I would say no, as it seems valid to have partitioned and sorted >>>> tables. >>>> 3) How should Iceberg encode sort sp

Re: Sort Spec

2019-07-18 Thread Owen O'Malley
ncode non-trivial sort specs and track sort >> spec evolution (if needed). >> - Option #2 is to extend PartitionSpec to cover sorting as well. This >> option will allow us to use transformations to encode non-trivial sorts and >> won't require many changes to the codebase.

Re: Updates/Deletes/Upserts in Iceberg

2019-07-03 Thread Owen O'Malley
>>> How about 9AM PDT on Friday, 5 July then? >>> >>>> On Wed, Jul 3, 2019 at 10:55 AM Owen O'Malley >>>> wrote: >>>> I'd like to call in, but I'm out Thursday. Friday would work except 11am >>>> to 1pm pdt. >>>&g

Re: Updates/Deletes/Upserts in Iceberg

2019-07-03 Thread Owen O'Malley
I'd like to call in, but I'm out Thursday. Friday would work except 11am to 1pm pdt. .. Owen On Wed, Jul 3, 2019 at 10:42 AM Ryan Blue wrote: > I'm available Thursday and Friday this week as well, but it's a holiday in > the US so some people may be out. If there are no objections from anyone >

Re: IPMC report draft for July 2019

2019-07-03 Thread Owen O'Malley
None yet > > Have your mentors been helpful and responsive or are things falling > through the cracks? In the latter case, please list any open issues > that need to be addressed. > > Yes. > > Signed-off-by: > > [X](iceberg) Ryan Blue > Comments: I w

Re: Sort Spec

2019-07-01 Thread Owen O'Malley
My thought is just like Iceberg has to define partitioning and bucketing, it has to define a canonical sort order. In particular, we can’t afford to have Spark, Presto, and Hive writing files in different orders. I believe the right approach is to define a sort order as a series of columns where

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Owen O'Malley
> On May 21, 2019, at 1:31 PM, Jacques Nadeau wrote: > > The main thing I'm talking about is how you target a deletion across time. If > you have a file A, and you want to delete record X in A, you define delete > A.X. At the same time, another process may be compacting A into A'. In so > do

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Owen O'Malley
> On May 15, 2019, at 12:54 PM, Ryan Blue wrote: > > 2. Iceberg diff files should use synthetic keys > > A lot of the discussion on the doc is about whether natural keys are > practical or what assumptions we can make or trade about them. In my opinion, > Iceberg tables will absolutely need

Re: Approaching Vectorized Reading in Iceberg ..

2019-05-28 Thread Owen O'Malley
On Fri, May 24, 2019 at 8:28 PM Ryan Blue wrote: > if Iceberg Reader was to wrap Arrow or ColumnarBatch behind an > Iterator[InternalRow] interface, it would still not work right? Coz it > seems to me there is a lot more going on upstream in the operator execution > path that would be needed to b

Re: IPMC report for March 2019

2019-03-08 Thread Owen O'Malley
things falling > through the cracks? In the latter case, please list any open issues > that need to be addressed. > > Yes. > > Signed-off-by: > > [X](iceberg) Ryan Blue > Comments: I wrote the first pass of the report. > [ ](iceberg) Julien Le Dem > Comments: > [ ](iceberg) Owen O'Malley > Comments: Approval from +1 on dev list. > [ ](iceberg) James Taylor > Comments: > [ ](iceberg) Carl Steinbach > Comments: > > > -- > Ryan Blue > Software Engineer > Netflix

Re: [VOTE] Community code reviews

2019-02-28 Thread Owen O'Malley
+1 > On Feb 28, 2019, at 8:03 AM, Romin Parekh wrote: > > +1 > > On Thu, Feb 28, 2019 at 1:17 AM Anton Okolnychyi > wrote: > +1 > >> On 28 Feb 2019, at 07:47, Renato Marroquín Mogrovejo >> mailto:renatoj.marroq...@gmail.com>> wrote: >> >> +1 >> >> El jue., 28 feb. 2019 a las 8:00, Dongjoo

Re: Iceberg and Hive

2019-01-07 Thread Owen O'Malley
The group has moved to the Apache infrastructure, so we should use dev@iceberg.apache.org . What is required, but not started, is for someone to implement Hive's RawStore API with an Iceberg backend. That would let you use Hive SQL commands to manipulate the Iceberg tables. .. Owen On Mon, Jan

Re: [DISCUSS] Draft report for January 2019

2019-01-07 Thread Owen O'Malley
he cracks? In the latter case, please list any open issues > that need to be addressed. > > Last month was December, so traffic has been low and both PPMC members > and > mentors were slow to respond. This is not abnormal, but the PPMC missed > the > deadline to file this report

Re: project report

2018-12-04 Thread Owen O'Malley
e whether to do a source-only first release or to go through the > pain of publishing convenience binaries with their own LICENSE and NOTICE > content. > > rb > > On Tue, Dec 4, 2018 at 3:37 PM Owen O'Malley > wrote: > > > I wrote a first pass of the report for

project report

2018-12-04 Thread Owen O'Malley
I wrote a first pass of the report for the Apache board. Iceberg > > Iceberg is a table format for large, slow-moving tabular data. > > Iceberg has been incubating since 2018-11-16. > > Three most important issues to address in the move towards graduation: > > 1. Get the SGA accepted. > 2. Fin

Re: merge-on-read?

2018-11-28 Thread Owen O'Malley
y the latest of those >takes effect). > > Obviously readers would need to be updated to correctly interpret this > data. And there is all kinds of supporting work that would be required in > order to maintain these (periodically collapsing diffs into the base, > etc.). > &g

Re: merge-on-read?

2018-11-28 Thread Owen O'Malley
I’m not sure what use case Erik is looking for, but I’ve had users that want to do the equivalent of HBase’s column families. They want some of the columns to be stored separately and the merged together on read. The requirements would be that there is a 1:1 mapping between rows in the matching

Issue list?

2018-11-27 Thread Owen O'Malley
All, As we move over to Apache infrastructure, we need to decide what works for the community. The dev list is getting a lot of traffic and is probably intimidating to new comers. Currently the notices are: Pull Requests and issue creation/comment/close -> dev@ Git commit -> commits@ One pat