I think there's a couple of embedded / entangled questions here that about this:
* Should Arrow be able to be used to *transport* narrow decimals — for
the (now very abundant) use cases where Arrow is being used as an
internal wire protocol or client/server interface
* Should *compute engines*
Regarding TPC-H and widening, we can (and do currently for the one query we
have implemented) cast the decimal back down to the correct precision after
each multiplication, so I don’t think this is an issue. On the other hand,
there are definitely things we can do to dynamically detect if
I'm generally -0.01 against narrow decimals. My experience in practice has
been that widening happens so quickly that they are little used and add
unnecessary complexity. For reference, the original Arrow code actually
implemented Decimal9 [1] and Decimal18 [2] but we removed both because of
this
Any update on this proposal? I think this will be a useful addition
too. I can potentially help on the Rust side implementation.
Chao
On Tue, Mar 8, 2022 at 1:00 PM Jorge Cardoso Leitão
wrote:
>
> Agreed.
>
> Also, I would like to revise my previous comment about the small risk.
> While
Agreed.
Also, I would like to revise my previous comment about the small risk.
While prototyping this I did hit some bumps. They primary came from two
reasons:
* I was unable to find arrow/json files in the arrow-testing generated
files with a non-default decimal bitwidth (I think we only have
>
> I’d also like to chime in in favor of 32- and 64-bit decimals because
> it’ll help achieve better performance on TPC-H (and maybe other
> benchmarks). The decimal columns need only 12 digits of precision, for
> which a 64-bit decimal is sufficient. It’s currently wasteful to use a
> 128-bit
I’d also like to chime in in favor of 32- and 64-bit decimals because it’ll
help achieve better performance on TPC-H (and maybe other benchmarks). The
decimal columns need only 12 digits of precision, for which a 64-bit decimal is
sufficient. It’s currently wasteful to use a 128-bit decimal.
>
> Do we want to keep the historical "C++ and Java" requirement or
> do we want to make it a more flexible "two independent official
> implementations", which could be for example C++ and Rust, Rust and
> Java, etc.
I think flexibility here is a good idea, I'd like to hear other opinions.
For
Le 07/03/2022 à 20:26, Micah Kornfield a écrit :
Relaxing from {128,256} to {32,64,128,256} seems a low risk
from an integration perspective, as implementations already need to read
the bitwidth to select the appropriate physical representation (if they
support it).
I think there are two
>
> Relaxing from {128,256} to {32,64,128,256} seems a low risk
> from an integration perspective, as implementations already need to read
> the bitwidth to select the appropriate physical representation (if they
> support it).
I think there are two reasons for having implementations first.
1.
+1 adding 32 and 64 bit decimals.
+0 to release it without integration tests - both IPC and the C data
interface use a variable bit width to declare the appropriate size for
decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low risk
from an integration perspective, as
Le 03/03/2022 à 18:05, Micah Kornfield a écrit :
I think this makes sense to add these. Typically when adding new types,
we've waited on the official vote until there are two reference
implementations demonstrating compatibility.
You are right, I had forgotten about that. Though in this
RAPIDS/libcudf would definitely support this.
For what it's worth, libcudf's fixed_point decimal type implementation is
standalone and could easily be extracted/reused:
https://github.com/rapidsai/cudf/blob/branch-22.04/cpp/include/cudf/fixed_point/fixed_point.hpp
We even had a recent blog that
Libcudf / cuDF have supported 32-bit and 64-bit decimals for a few releases
now (as well as 128-bit decimals in the past couple of releases) and
they've generally been received positively from the community. Being able
to roundtrip them through Arrow would definitely be nice as well!
On Thu, Mar
I think this makes sense to add these. Typically when adding new types,
we've waited on the official vote until there are two reference
implementations demonstrating compatibility.
On Thu, Mar 3, 2022 at 6:55 AM Antoine Pitrou wrote:
>
> Hello,
>
> Currently, the Arrow format specification
Hello,
Currently, the Arrow format specification restricts the bitwidth of
decimal numbers to either 128 or 256 bits.
However, there is interest in allowing other bitwidths, at least 32 and
64 bits for this proposal. A 64-bit (respectively 32-bit) decimal
datatype would allow for
16 matches
Mail list logo