Re: [PR] GH-44248: [Format] Add TimestampWithOffset canonical extension type [arrow]
conbench-apache-arrow[bot] commented on PR #48002: URL: https://github.com/apache/arrow/pull/48002#issuecomment-3675061181 After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit b10386e6a0ea5f3bbe6c00dcd124819b7fbf3071. There were no benchmark performance regressions. 🎉 The [full Conbench report](https://github.com/apache/arrow/runs/58538009612) has more details. It also includes information about 11 possible false positives for unstable benchmarks that are known to sometimes produce them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [Format] Add TimestampWithOffset canonical extension type [arrow]
felipecrv merged PR #48002: URL: https://github.com/apache/arrow/pull/48002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
rok commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560709739 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: Offsets are assigned politically and we can't hope to regulate them here :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
felipecrv commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560504056 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: Computation will take the offset in minutes, multiply by 60*10^k with k in 0..9 and combine that with the timestamp. No need to normalize, but it's good to list the normally expected range. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
felipecrv commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560497438 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: ```suggestion * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets normally range from -779 (-12:59) to +780 (+13:00). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560487362 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: We could reword this to say "offsets _typically_ range from -779 (-12:59) to +780 (+13:00)" to suggest this is the most usual case but it's not necessarily required. ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: We could reword this to say "offsets _typically_ range from -779 (-12:59) to +780 (+13:00)" to suggest this is the most usual case but it's not required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560405205 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: Technically it is possible to have larger `offset_minutes`, it's just that the values might wrap around and won't be "normalized". For example, `2025-01-01 + 0min` is the same instant in time as `2025-01-02 - 1440min`. So it's not necessarily a matter of correctness or "invalid offsets", but a matter of normalization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560419884 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: In the previous thread, we were discussing what to do when either inner field is nullable and defining a standard normalization behavior in that case as well. The discussion circled back to "let's just say it's non-nullable and defer to implementations what to do otherwise". Should we follow the same idea here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560405205 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: Technically it is possible to have more larger `offset_minutes`, it's just that the values might wrap around and won't be "normalized". For example, `2025-01-01 + 0min` is the same instant in time as `2025-01-02 - 1440min`. So it's not necessarily a matter of correctness or "invalid offsets", but a matter of normalization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560419884 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: In the previous thread, we were discussing what to do when either inner field is nullable and defining a standard normalization behavior in that case as well. The discussion circled back to "let's just say it's non-nullable and defer to implementations what to do". Should we follow the same idea here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560405205 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: Technically it is possible to have more larger `offset_minutes`, it's just that the values might wrap around and won't be "normalized". For example, `2025-01-01 + 0min` is the same instant in time as `2025-01-02 - 1440min`. So it's not a matter of correctness or "invalid offsets", but a matter of normalization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
wgtmac commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2560206848 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). Review Comment: > Offsets range from -779 (-12:59) to +780 (+13:00) Do we need to regulate what to do with invalid offsets? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
jorisvandenbossche commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2559406830 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += Review Comment: ```suggestion Timestamp With Offset = ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
alamb commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2550596333 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + This type does not have any parameters. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. Review Comment: makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
felipecrv commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2550422505 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + This type does not have any parameters. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. Review Comment: In the interest of brevity of the spec, I advised @serramatutu to not have recommendations here. I would assume anyone deciding to dictionary-encode an int16 array is considering the gains (or not) that they would have from that encoding. We are allowing but not recommending complicated encoding of an already compact 16-bit-valued array. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
alamb commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2550120425 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,33 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + This type does not have any parameters. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. Review Comment: Do we want to say anything about what types of dictionary keys are allowed? Like is it ok to use Int8/Int16/Int32 keys -- while Int32 probably doesn't make sense as it would make the column larger, it might be necessary in some cases to encode the full range of Int16 values 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2550040624 ## docs/source/format/CanonicalExtensions.rst: ## @@ -483,6 +483,28 @@ binary values look like. .. _variant_primitive_type_mapping: +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). Review Comment: Per @pitrou's thread, this will go back to being non-nullable. Closing this one in favor of the other thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2550038306 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,49 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. + +.. note:: + + It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be nullable, but not recommended. + + If ``timestamp`` is nullable and a value is found to be null, then the whole ``TimestampWithOffset`` value should be interpreted as null. One way of achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the top-level struct's validity buffer (V2) with the result of ``V1 AND V2``. + + If ``offset_minutes`` is nullable and a value is found to be null, then this value should be interpreted as if the offset value were were zero. Review Comment: Latest change goes back to the original spec: https://github.com/apache/arrow/pull/48002/commits/043b07c9ded4221d3d9e4a7c024c45e0a03b7b7e ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,49 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. Review Comment: Resolved: https://github.com/apache/arrow/pull/48002/commits/043b07c9ded4221d3d9e4a7c024c45e0a03b7b7e -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2550037631 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,49 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. + +.. note:: + + It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be nullable, but not recommended. + + If ``timestamp`` is nullable and a value is found to be null, then the whole ``TimestampWithOffset`` value should be interpreted as null. One way of achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the top-level struct's validity buffer (V2) with the result of ``V1 AND V2``. + + If ``offset_minutes`` is nullable and a value is found to be null, then this value should be interpreted as if the offset value were were zero. + + It is *recommended* that implementations normalize this type's representation by dropping the inner validity buffers and applying the aforementioned transformations, only keeping the top-level struct's validity buffer. + +.. note:: Review Comment: Resolved: https://github.com/apache/arrow/pull/48002/commits/043b07c9ded4221d3d9e4a7c024c45e0a03b7b7e -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
felipecrv commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2547548706 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,49 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. + +.. note:: + + It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be nullable, but not recommended. + + If ``timestamp`` is nullable and a value is found to be null, then the whole ``TimestampWithOffset`` value should be interpreted as null. One way of achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the top-level struct's validity buffer (V2) with the result of ``V1 AND V2``. + + If ``offset_minutes`` is nullable and a value is found to be null, then this value should be interpreted as if the offset value were were zero. Review Comment: Good. We are actually back to the very original proposal by @serramatutu that demanded non-nullable fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
pitrou commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2546861524 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,49 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. + +.. note:: + + It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be nullable, but not recommended. + + If ``timestamp`` is nullable and a value is found to be null, then the whole ``TimestampWithOffset`` value should be interpreted as null. One way of achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the top-level struct's validity buffer (V2) with the result of ``V1 AND V2``. + + If ``offset_minutes`` is nullable and a value is found to be null, then this value should be interpreted as if the offset value were were zero. Review Comment: Well, if that's the intent, the spec needs to be clear that all 3 validity buffers have to be the identical. But that would be frankly a bizarre variation from normal Struct policies and I don't see any reason to constrain implementations in such a way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
paleolimbot commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2546854150 ## docs/source/format/CanonicalExtensions.rst: ## @@ -544,6 +544,49 @@ Primitive Type Mappings | UUID extension type | UUID | +--++ +.. _timestamp_with_offset_extension: + +Timestamp With Offset += +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. +This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). + + * ``offset_minutes``: a preferably non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00). + +* Extension type parameters: + + * ``time_unit``: the time-unit of each of the stored UTC timestamps. + +* Description of the serialization: + + Extension metadata is an empty string. + +.. note:: + + It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded. + +.. note:: + + It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be nullable, but not recommended. + + If ``timestamp`` is nullable and a value is found to be null, then the whole ``TimestampWithOffset`` value should be interpreted as null. One way of achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the top-level struct's validity buffer (V2) with the result of ``V1 AND V2``. + + If ``offset_minutes`` is nullable and a value is found to be null, then this value should be interpreted as if the offset value were were zero. Review Comment: > Why not mandate that struct fields are always non-null +1. I had the same worry when we decided to make all of GeoArrow's native memory layouts have non-nullable children...the language in GeoArrow is that children must not contain nulls. When producing, all the implementations explicitly mark fields as non-nullable. When consuming, they check for a zero null count. Specifying anything about the specific arrangement of buffers I think is out of scope for an extension type definition (if we want to do this, somebody has to make sure it can be done in places like DuckDB, JavaScript, and Julia that might not offer the same level of control over buffer creation). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]
github-actions[bot] commented on PR #48002: URL: https://github.com/apache/arrow/pull/48002#issuecomment-3558941955 :warning: GitHub issue #44248 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
