Re: [PR] GH-44248: [Format] Add TimestampWithOffset canonical extension type [arrow]

2025-12-19 Thread via GitHub


conbench-apache-arrow[bot] commented on PR #48002:
URL: https://github.com/apache/arrow/pull/48002#issuecomment-3675061181

   After merging your PR, Conbench analyzed the 3 benchmarking runs that have 
been run so far on merge-commit b10386e6a0ea5f3bbe6c00dcd124819b7fbf3071.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/58538009612) 
has more details. It also includes information about 11 possible false 
positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [Format] Add TimestampWithOffset canonical extension type [arrow]

2025-12-05 Thread via GitHub


felipecrv merged PR #48002:
URL: https://github.com/apache/arrow/pull/48002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


rok commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560709739


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   Offsets are assigned politically and we can't hope to regulate them here :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


felipecrv commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560504056


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   Computation will take the offset in minutes, multiply by 60*10^k with k in 
0..9 and combine that with the timestamp. No need to normalize, but it's good 
to list the normally expected range.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


felipecrv commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560497438


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   ```suggestion
 * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets normally range from -779 (-12:59) to +780 (+13:00).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560487362


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   We could reword this to say "offsets _typically_ range from -779 (-12:59) to 
+780 (+13:00)" to suggest this is the most usual case but it's not necessarily 
required.



##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   We could reword this to say "offsets _typically_ range from -779 (-12:59) to 
+780 (+13:00)" to suggest this is the most usual case but it's not required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560405205


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   Technically it is possible to have larger `offset_minutes`, it's just that 
the values might wrap around and won't be "normalized". For example, 
`2025-01-01 + 0min` is the same instant in time as `2025-01-02 - 1440min`.
   
   So it's not necessarily a matter of correctness or "invalid offsets", but a 
matter of normalization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560419884


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   In the previous thread, we were discussing what to do when either inner 
field is nullable and defining a standard normalization behavior in that case 
as well. The discussion circled back to "let's just say it's non-nullable and 
defer to implementations what to do otherwise".
   
   Should we follow the same idea here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560405205


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   Technically it is possible to have more larger `offset_minutes`, it's just 
that the values might wrap around and won't be "normalized". For example, 
`2025-01-01 + 0min` is the same instant in time as `2025-01-02 - 1440min`.
   
   So it's not necessarily a matter of correctness or "invalid offsets", but a 
matter of normalization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560419884


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   In the previous thread, we were discussing what to do when either inner 
field is nullable and defining a standard normalization behavior in that case 
as well. The discussion circled back to "let's just say it's non-nullable and 
defer to implementations what to do".
   
   Should we follow the same idea here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560405205


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   Technically it is possible to have more larger `offset_minutes`, it's just 
that the values might wrap around and won't be "normalized". For example, 
`2025-01-01 + 0min` is the same instant in time as `2025-01-02 - 1440min`.
   
   So it's not a matter of correctness or "invalid offsets", but a matter of 
normalization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


wgtmac commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2560206848


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   > Offsets range from -779 (-12:59) to +780 (+13:00)
   
   Do we need to regulate what to do with invalid offsets?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-25 Thread via GitHub


jorisvandenbossche commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2559406830


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=

Review Comment:
   ```suggestion
   Timestamp With Offset
   =
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-21 Thread via GitHub


alamb commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2550596333


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  This type does not have any parameters.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.

Review Comment:
   makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-21 Thread via GitHub


felipecrv commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2550422505


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  This type does not have any parameters.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.

Review Comment:
   In the interest of brevity of the spec, I advised @serramatutu to not have 
recommendations here. I would assume anyone deciding to dictionary-encode an 
int16 array is considering the gains (or not) that they would have from that 
encoding. We are allowing but not recommending complicated encoding of an 
already compact 16-bit-valued array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-21 Thread via GitHub


alamb commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2550120425


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,33 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  This type does not have any parameters.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.

Review Comment:
   Do we want to say anything about what types of dictionary keys are allowed? 
Like is it ok to use Int8/Int16/Int32 keys -- while Int32 probably doesn't make 
sense as it would make the column larger, it might be necessary in some cases 
to encode the full range of Int16 values 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-21 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2550040624


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -483,6 +483,28 @@ binary values look like.
 
 .. _variant_primitive_type_mapping:
 
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).

Review Comment:
   Per @pitrou's thread, this will go back to being non-nullable.
   
   Closing this one in favor of the other thread.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-21 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2550038306


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.
+
+.. note::
+
+   It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be 
nullable, but not recommended.
+
+   If ``timestamp`` is nullable and a value is found to be null, then the 
whole ``TimestampWithOffset`` value should be interpreted as null. One way of 
achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the 
top-level struct's validity buffer (V2) with the result of ``V1 AND V2``.
+
+   If ``offset_minutes`` is nullable and a value is found to be null, then 
this value should be interpreted as if the offset value were were zero.

Review Comment:
   Latest change goes back to the original spec: 
https://github.com/apache/arrow/pull/48002/commits/043b07c9ded4221d3d9e4a7c024c45e0a03b7b7e



##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.

Review Comment:
   Resolved: 
https://github.com/apache/arrow/pull/48002/commits/043b07c9ded4221d3d9e4a7c024c45e0a03b7b7e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-21 Thread via GitHub


serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2550037631


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.
+
+.. note::
+
+   It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be 
nullable, but not recommended.
+
+   If ``timestamp`` is nullable and a value is found to be null, then the 
whole ``TimestampWithOffset`` value should be interpreted as null. One way of 
achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the 
top-level struct's validity buffer (V2) with the result of ``V1 AND V2``.
+
+   If ``offset_minutes`` is nullable and a value is found to be null, then 
this value should be interpreted as if the offset value were were zero.
+
+   It is *recommended* that implementations normalize this type's 
representation by dropping the inner validity buffers and applying the 
aforementioned transformations, only keeping the top-level struct's validity 
buffer.
+
+.. note::

Review Comment:
   Resolved: 
https://github.com/apache/arrow/pull/48002/commits/043b07c9ded4221d3d9e4a7c024c45e0a03b7b7e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-20 Thread via GitHub


felipecrv commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2547548706


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.
+
+.. note::
+
+   It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be 
nullable, but not recommended.
+
+   If ``timestamp`` is nullable and a value is found to be null, then the 
whole ``TimestampWithOffset`` value should be interpreted as null. One way of 
achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the 
top-level struct's validity buffer (V2) with the result of ``V1 AND V2``.
+
+   If ``offset_minutes`` is nullable and a value is found to be null, then 
this value should be interpreted as if the offset value were were zero.

Review Comment:
   Good. We are actually back to the very original proposal by @serramatutu 
that demanded non-nullable fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-20 Thread via GitHub


pitrou commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2546861524


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.
+
+.. note::
+
+   It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be 
nullable, but not recommended.
+
+   If ``timestamp`` is nullable and a value is found to be null, then the 
whole ``TimestampWithOffset`` value should be interpreted as null. One way of 
achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the 
top-level struct's validity buffer (V2) with the result of ``V1 AND V2``.
+
+   If ``offset_minutes`` is nullable and a value is found to be null, then 
this value should be interpreted as if the offset value were were zero.

Review Comment:
   Well, if that's the intent, the spec needs to be clear that all 3 validity 
buffers have to be the identical.
   
   But that would be frankly a bizarre variation from normal Struct policies 
and I don't see any reason to constrain implementations in such a way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-20 Thread via GitHub


paleolimbot commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2546854150


##
docs/source/format/CanonicalExtensions.rst:
##
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID   |
 +--++
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.
+
+.. note::
+
+   It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be 
nullable, but not recommended.
+
+   If ``timestamp`` is nullable and a value is found to be null, then the 
whole ``TimestampWithOffset`` value should be interpreted as null. One way of 
achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the 
top-level struct's validity buffer (V2) with the result of ``V1 AND V2``.
+
+   If ``offset_minutes`` is nullable and a value is found to be null, then 
this value should be interpreted as if the offset value were were zero.

Review Comment:
   > Why not mandate that struct fields are always non-null
   
   +1. I had the same worry when we decided to make all of GeoArrow's native 
memory layouts have non-nullable children...the language in GeoArrow is that 
children must not contain nulls. When producing, all the implementations 
explicitly mark fields as non-nullable. When consuming, they check for a zero 
null count. Specifying anything about the specific arrangement of buffers I 
think is out of scope for an extension type definition (if we want to do this, 
somebody has to make sure it can be done in places like DuckDB, JavaScript, and 
Julia that might not offer the same level of control over buffer creation).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-44248: [FORMAT] Add TimestampWithOffset canonical extension type [arrow]

2025-11-20 Thread via GitHub


github-actions[bot] commented on PR #48002:
URL: https://github.com/apache/arrow/pull/48002#issuecomment-3558941955

   :warning: GitHub issue #44248 **has been automatically assigned in GitHub** 
to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]