Re: Catalog API for Partition

2020-07-21 Thread JackyLee
The `partitioning` in `TableCatalog.createTable` is a partition schema for table, which doesn't contains the partition metadata for an actual partition. Besides, the actual partition metadata may contains many partition schema, such as hive partition. Thus I created a `TablePartition` to contains

Re: Catalog API for Partition

2020-07-20 Thread Wenchen Fan
Yea we don't want the partitions to be Hive-specific. My point is, we call it "Partition Catalog APIs", which makes me confused about the relationship between this and the "partitions" in `TableCatalog.createTable`. Are these two orthogonal? Or you kind of unify them? On Sat, Jul 18, 2020 at 12:02

Re: Catalog API for Partition

2020-07-17 Thread JackyLee
Hi, wenchen. Thanks for your attention and reply. Firstly. These Partition Catalog APIs are not specially used for hive, they can be used with LakeHouse or myql or other source support partitions. Secondly. These Partition Catalog APIs are only designed for better data management, not for speed up

Re: Catalog API for Partition

2020-07-17 Thread Wenchen Fan
In Hive, partition does two things: 1. Act as an index to speed up data scan 2. Act as a way to manage the data. People can add/drop partitions. How do you unify these 2 things in your API design? On Fri, Jul 17, 2020 at 12:03 AM JackyLee wrote: > Hi devs, > > In order to support Partition Comm

Catalog API for Partition

2020-07-16 Thread JackyLee
Hi devs, In order to support Partition Commands for datasourcev2 and Lakehouse, I'm trying to add Partition API for multiple Catalog. They are widely used APIs in mysql or hive or other datasources, we can use these API to mange Partition metadata in Lakehouse. JIRA: https://issues.apache.org/ji