Re: Iceberg Python library support

2021-04-14 Thread Chen Song
, Apr 14, 2021 at 10:28 AM Ryan Murray wrote: > Hey Chen Song, > > Answers inline below > > On Wed, Apr 14, 2021 at 4:04 PM Chen Song wrote: > >> Is https://iceberg.apache.org/python-feature-support/ still up to date? >> Are the following statements true

Iceberg Python library support

2021-04-14 Thread Chen Song
ystem/filesystem_tables.py . Best, -- Chen Song

question on range overwrite/delete within a partition

2021-04-08 Thread Chen Song
he data file to be deleted (I only know the key range). I know I can read the data file and then figure out the positions but that is effectively the same as re-reading the data. My question is, when using Iceberg core API, is there a way to compose a range delete like the above, w/o overwrite the entire partition, or reading back the data? Any thoughts? -- Chen Song

Re: how to test row level delete

2021-04-08 Thread Chen Song
Thanks for the clarification. On Tue, Apr 6, 2021 at 10:25 PM OpenInx wrote: > Hi Chen Song > > If want to test the format v2 under your env, you could follow this > comment https://github.com/apache/iceberg/pull/2410#issuecomment-812463051 > to upgrade your iceberg table to for

Re: how to test row level delete

2021-04-06 Thread Chen Song
>>> EqualityDeleteWriter writer = Parquet.writeDeletes(out) >>> .forTable(table) >>> .withPartition(Row.of("20201221")) >>> .rowSchema(deleteRowSchema) >>> .createWriterFunc(GenericParquetWriter::buildWriter) >>> .overwrite() >>> >>> .equalityFieldIds(deleteRowSchema.columns().stream().mapToInt(Types.NestedField::fieldId).toArray()) >>> .buildEqualityWriter(); >>> >>> try (Closeable toClose = writer) { >>> writer.deleteAll(deletes); >>> } >>> >>> return writer.toDeleteFile(); >>> } >>> >>> liubo07199 >>> liubo07...@hellobike.com >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=liubo07199&uid=liubo07199%40hellobike.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22liubo07199%40hellobike.com%22%5D> >>> >> -- Chen Song

Re: Question on ordering on partitions when read

2021-03-25 Thread Chen Song
l also pack files into tasks in most > cases (though not for `IcebergGenerics`) so files can be reordered > depending on size as well. > > On Thu, Mar 25, 2021 at 8:06 AM Chen Song wrote: > >> Popping up the question. >> >> On Wed, Mar 24, 2021 at 2:01 P

Re: Question on ordering on partitions when read

2021-03-25 Thread Chen Song
Popping up the question. On Wed, Mar 24, 2021 at 2:01 PM Chen Song wrote: > I want to clarify the ordering semantics (if deterministic) on partitions > returned when using iceberg core data API to read. > > Say I define a table with a *time* column and partition by *day(time)*,

Question on ordering on partitions when read

2021-03-24 Thread Chen Song
partition, which I know that has to be enforced by the writer. -- Chen Song

Re: Iceberg sync notes - 10 March 2021

2021-03-22 Thread Chen Song
e > this support for now, which is mostly due to the feature still under > development for the same reason mentioned above. > > Thank you, > Yan > > > On Tue, Mar 16, 2021 at 2:33 PM Chen Song wrote: > >> Thanks Yan. I have a question about sort order support. I saw >

Re: Iceberg sync notes - 10 March 2021

2021-03-16 Thread Chen Song
pending PRs. > > Thank you! > Yan > > > On Tue, Mar 16, 2021 at 8:06 AM Chen Song wrote: > >> Thanks for the summary. On V2 format. Is there a google doc to review, or >> any sort of backlog of tickets to track? >> >> Chen >> >> On Mon, Mar

Re: Iceberg sync notes - 10 March 2021

2021-03-16 Thread Chen Song
ceberg/pull/1849> and is about to start > working on an implementation. > - Agreed to collaborate on the dev list. More eyes would be great. > > > The link to the doc: > https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg > > Thanks, > Anton > -- Chen Song

Re: Shall we start a regular community sync up?

2020-12-01 Thread Chen Song
gt;>> On Fri, Mar 20, 2020 at 8:17 AM RD wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> Same time works for me too! >>>>>>>>> >>>>> >>>>>>>>> >>>>> On Thu, Mar 19, 2020 at 4:45 PM Xabriel Collazo Mojica >>>>>>>>> wrote: >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> 5pm or 5:30pm PT any day next week would work for me. >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Thanks for restoring the community sync up! >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> Xabriel J Collazo Mojica | Sr Computer Scientist II | >>>>>>>>> Adobe >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> On 3/18/20, 6:45 PM, "justin_cof...@apple.com on behalf >>>>>>>>> of Justin Q Coffey" >>>>>>>> j...@apple.com.INVALID> wrote: >>>>>>>>> >>>>>> >>>>>>>>> >>>>>>Any chance we could actually do 5:30pm PST? I'm a bit >>>>>>>>> of a lurker, but this roadmap is important to mine and I have a daily >>>>>>>>> at >>>>>>>>> 5pm :(. >>>>>>>>> >>>>>> >>>>>>>>> >>>>>>-Justin >>>>>>>>> >>>>>> >>>>>>>>> >>>>>>> On Mar 18, 2020, at 6:43 PM, Saisai Shao < >>>>>>>>> sai.sai.s...@gmail.com> wrote: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> 5pm PST in any day works for me. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Looking forward to it. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> Thanks >>>>>>>>> >>>>>>> Saisai >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> -- >>>>>>>>> >>>> >>>>>>>>> >>>> 李响 Xiang Li >>>>>>>>> >>>> >>>>>>>>> >>>> 手机 cellphone :+86-136-8113-8972 >>>>>>>>> >>>> 邮件 e-mail :wate...@gmail.com >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> -- >>>>>>>>> >> Best Regards >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> John Zhuge >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Software Engineer >>>>>>> Netflix >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>>>> >>>>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Edgar R >> > > > -- > Thanks > Vivek > -- Chen Song

Re: Iceberg V2 Spec

2020-09-14 Thread Chen Song
ly to all data files with the same or lower >>>> sequence number. >>>> >>>> I'm planning on updating what's currently in the spec now that we have >>>> sequence numbers and delete file metadata committed in master, but right >>>>

Re: Iceberg's type system

2020-09-11 Thread Chen Song
great to hear proposals. > > rb > > On Fri, Sep 11, 2020 at 10:30 AM Chen Song wrote: > >> Any thoughts, or suggestions on this? >> >> On Tue, Sep 8, 2020 at 3:01 PM Chen Song wrote: >> >>> Hi >>> >>> I have a general question

Re: Iceberg's type system

2020-09-11 Thread Chen Song
Any thoughts, or suggestions on this? On Tue, Sep 8, 2020 at 3:01 PM Chen Song wrote: > Hi > > I have a general question on Iceberg's data type system. Iceberg has a > well defined type spec > <https://iceberg.apache.org/spec/#schemas-and-data-types> which can be

Iceberg's type system

2020-09-08 Thread Chen Song
Hi I have a general question on Iceberg's data type system. Iceberg has a well defined type spec which can be mapped to types in Avro, Parquet, ORC. If users want to use Iceberg and extend the universe of data types (e.g., adding custom typ

Re: Arrow Support in Parquet Writers

2020-07-07 Thread Chen Song
gt; > > > Sure, if you need an Arrow writer and want to work on it, we would be > happy to include it in Iceberg. > > > > What is your use case? The main reason why we don't have one is that > neither Presto nor Spark uses Arrow for writing. > > > >

Arrow Support in Parquet Writers

2020-07-06 Thread Chen Song
is not currently implemented but OK to enhance the data API to support this? -- Chen Song

Re: Question on partitioning using Java API

2020-07-06 Thread Chen Song
> feel free to open an issue or pull request. > > rb > > > > On Thu, Jul 2, 2020 at 9:19 AM Chen Song wrote: > >> I have a question on how hidden partitioning works in Iceberg using Java >> API. >> The code is something like the following. >> >

Question on partitioning using Java API

2020-07-02 Thread Chen Song
right way to write partitioned data. Thanks, -- Chen Song

Iceberg V2 Spec

2020-07-01 Thread Chen Song
Thanks -- Chen Song

Re: Iceberg table compaction

2020-06-30 Thread Chen Song
them. Second, it allows you to go back and read the table at an older point > in time -- time-travel queries. > > I hope that helps, > > rb > > On Fri, Jun 26, 2020 at 10:39 AM Chen Song wrote: > >> Hey >> >> In Iceberg documentation, it mentions to u

Iceberg table compaction

2020-06-26 Thread Chen Song
> ? If so, it looks like it only applies to the most recent snapshot of data? Is there a way to compact data belonging to old snapshots? e.g., if I want to rewrite data for older data with newer partition spec? Thanks for the help in advance. -- Chen Song

Re: S3 example in Java

2020-06-24 Thread Chen Song
y Iceberg. You can alternatively write your >> custom Catalog implementation in which you set up your custom atomic commit >> mechanism as shown in http://iceberg.apache.org/custom-catalog/. >> >> Cheers, >> >> On Wed, Jun 24, 2020 at 6:12 AM Chen Song wrote: >>

S3 example in Java

2020-06-24 Thread Chen Song
Hi Are there any Java examples to create/write/read tables backed by S3? I tried to search in the documentation and github but did not find anything. Thanks Chen