Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-28 Thread via GitHub
emkornfield commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1469066912 ## format/spec.md: ## @@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an object. See the table fo |**`month`**|`JSON string: "month"`

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-28 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1469025897 ## format/spec.md: ## @@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an object. See the table fo |**`month`**|`JSON string: "month"`|

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-28 Thread via GitHub
emkornfield commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1468916964 ## format/spec.md: ## @@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an object. See the table fo |**`month`**|`JSON string: "month"`

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-28 Thread via GitHub
rdblue commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1468915010 ## format/spec.md: ## @@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an object. See the table fo |**`month`**|`JSON string: "month"`|`"mo

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-27 Thread via GitHub
emkornfield commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1468680676 ## format/spec.md: ## @@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an object. See the table fo |**`month`**|`JSON string: "month"`

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-27 Thread via GitHub
emkornfield commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1468680676 ## format/spec.md: ## @@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an object. See the table fo |**`month`**|`JSON string: "month"`

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-26 Thread via GitHub
advancedxy commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1912873339 > This change seems reasonable to me. @advancedxy, could you also post to the dev list that this was merged to get any input from folks who did not review before we release 1.5? I feel

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1912692894 This change seems reasonable to me. @advancedxy, could you also post to the dev list that this was merged to get any input from folks who did not review before we release 1.5? I feel

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-25 Thread via GitHub
szehon-ho commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1910769189 Merged, thanks @advancedxy ! Feel free to work on bucketv2, and we can make any other follow ups as well -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-25 Thread via GitHub
szehon-ho merged PR #8579: URL: https://github.com/apache/iceberg/pull/8579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-23 Thread via GitHub
advancedxy commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1907243141 @szehon-ho @aokolnychyi the `bucketV2` part is removed from this PR. Let me know if you have any more comments. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-22 Thread via GitHub
szehon-ho commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1904441605 Hi @advancedxy , I'm ok to leave that for the next pr. How about we just keep the notes for PartitionField and SortOrder like? ``` 1. For partition fields with a transform wi

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-18 Thread via GitHub
advancedxy commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1899645844 > First of all, we should evaluate other hash functions apart from Murmur3. Parquet, for instance, uses xxHash that is supposed to be much faster > Second, Parquet avoids the modulo

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-18 Thread via GitHub
aokolnychyi commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1899444592 @rdblue recently pointed me to the Bloom filter [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md) in Parquet. I think it contains a few interesting ideas tha

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1897787045 I'll also take a look at this tomorrow morning as well, thanks @advancedxy ! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-17 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1456811454 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-17 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1456810557 ## format/spec.md: ## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(doub

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454357231 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454350089 ## format/spec.md: ## @@ -1145,9 +1191,14 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list i

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454344286 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1894729096 This seems in a pretty good shape. I guess the open question is about `bucket` vs `bucketV2` naming. I'll also check the math behind bucketing on multiple values with fresh eyes on Th

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454245996 ## format/spec.md: ## @@ -1145,9 +1191,14 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454244711 ## format/spec.md: ## @@ -1145,9 +1191,14 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454244395 ## format/spec.md: ## @@ -1145,9 +1191,14 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454243815 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454243815 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454241858 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454239993 ## format/spec.md: ## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(dou

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454239993 ## format/spec.md: ## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(dou

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454239335 ## format/spec.md: ## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(dou

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1454236931 ## format/spec.md: ## @@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | Transform name| Description

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453487893 ## format/spec.md: ## @@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | Transform name| Description

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453308229 ## format/spec.md: ## @@ -1149,6 +1195,12 @@ Each sort field in the fields list is stored as an object with the following pro |--- |--- |--- | |**`Sort Field`**|`

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453286267 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453286267 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-16 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1453283842 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-15 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1452983761 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list i

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-15 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1452981468 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list i

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-13 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1451452956 ## format/spec.md: ## @@ -329,19 +329,35 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buc

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-13 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1451447151 ## format/spec.md: ## @@ -1060,6 +1076,14 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(doub

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-12 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1450615797 ## format/spec.md: ## @@ -329,19 +329,35 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buck

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-11 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448606533 ## format/spec.md: ## @@ -322,14 +322,22 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buc

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-11 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448596043 ## format/spec.md: ## @@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | Transform name| Description

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-11 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448582861 ## format/spec.md: ## @@ -986,6 +994,14 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(double

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-11 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448576223 ## format/spec.md: ## @@ -322,14 +322,22 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buc

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-11 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448575877 ## format/spec.md: ## @@ -1073,6 +1097,12 @@ Each sort field in the fields list is stored as an object with the following pro |--- |--- |--- | |**`Sort Field`**|`

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-11 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448572967 ## format/spec.md: ## @@ -1043,21 +1059,29 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-10 Thread via GitHub
szehon-ho commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1884585967 > WOW, big congrats on the arrival of your newborn. Thank you so much! > I will resume this work support once I finished my internal project, which I'm leveraging bucketi

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-09 Thread via GitHub
advancedxy commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1884116253 > Hi, @advancedxy , thanks for the work. Sorry for the delay, I am just returning from paternity leave. Will love to see this get in to get work on zorder and geo-transforms. I left so

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-09 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1445563921 ## format/spec.md: ## @@ -322,14 +322,22 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buck

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-08 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1445557050 ## format/spec.md: ## @@ -322,14 +322,22 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buck

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-08 Thread via GitHub
szehon-ho commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1882496190 Hi, @advancedxy , thanks for the work. Sorry for the delay, I am just returning from paternity leave. Will love to see this get in to get work on zorder and geo-transforms. I left so

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-08 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1445563921 ## format/spec.md: ## @@ -322,14 +322,22 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buck