Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-25 Thread Wes McKinney
I think there's a couple of embedded / entangled questions here that about this: * Should Arrow be able to be used to *transport* narrow decimals — for the (now very abundant) use cases where Arrow is being used as an internal wire protocol or client/server interface * Should *compute engines*

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-25 Thread Sasha Krassovsky
Regarding TPC-H and widening, we can (and do currently for the one query we have implemented) cast the decimal back down to the correct precision after each multiplication, so I don’t think this is an issue. On the other hand, there are definitely things we can do to dynamically detect if

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-23 Thread Jacques Nadeau
I'm generally -0.01 against narrow decimals. My experience in practice has been that widening happens so quickly that they are little used and add unnecessary complexity. For reference, the original Arrow code actually implemented Decimal9 [1] and Decimal18 [2] but we removed both because of this

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-21 Thread Chao Sun
Any update on this proposal? I think this will be a useful addition too. I can potentially help on the Rust side implementation. Chao On Tue, Mar 8, 2022 at 1:00 PM Jorge Cardoso Leitão wrote: > > Agreed. > > Also, I would like to revise my previous comment about the small risk. > While

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-08 Thread Jorge Cardoso Leitão
Agreed. Also, I would like to revise my previous comment about the small risk. While prototyping this I did hit some bumps. They primary came from two reasons: * I was unable to find arrow/json files in the arrow-testing generated files with a non-default decimal bitwidth (I think we only have

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-08 Thread Micah Kornfield
> > I’d also like to chime in in favor of 32- and 64-bit decimals because > it’ll help achieve better performance on TPC-H (and maybe other > benchmarks). The decimal columns need only 12 digits of precision, for > which a 64-bit decimal is sufficient. It’s currently wasteful to use a > 128-bit

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-08 Thread Sasha Krassovsky
I’d also like to chime in in favor of 32- and 64-bit decimals because it’ll help achieve better performance on TPC-H (and maybe other benchmarks). The decimal columns need only 12 digits of precision, for which a 64-bit decimal is sufficient. It’s currently wasteful to use a 128-bit decimal.

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-08 Thread Micah Kornfield
> > Do we want to keep the historical "C++ and Java" requirement or > do we want to make it a more flexible "two independent official > implementations", which could be for example C++ and Rust, Rust and > Java, etc. I think flexibility here is a good idea, I'd like to hear other opinions. For

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-08 Thread Antoine Pitrou
Le 07/03/2022 à 20:26, Micah Kornfield a écrit : Relaxing from {128,256} to {32,64,128,256} seems a low risk from an integration perspective, as implementations already need to read the bitwidth to select the appropriate physical representation (if they support it). I think there are two

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-07 Thread Micah Kornfield
> > Relaxing from {128,256} to {32,64,128,256} seems a low risk > from an integration perspective, as implementations already need to read > the bitwidth to select the appropriate physical representation (if they > support it). I think there are two reasons for having implementations first. 1.

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-07 Thread Jorge Cardoso Leitão
+1 adding 32 and 64 bit decimals. +0 to release it without integration tests - both IPC and the C data interface use a variable bit width to declare the appropriate size for decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low risk from an integration perspective, as

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-07 Thread Antoine Pitrou
Le 03/03/2022 à 18:05, Micah Kornfield a écrit : I think this makes sense to add these. Typically when adding new types, we've waited on the official vote until there are two reference implementations demonstrating compatibility. You are right, I had forgotten about that. Though in this

RE: Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-03 Thread Jake Hemstad
RAPIDS/libcudf would definitely support this. For what it's worth, libcudf's fixed_point decimal type implementation is standalone and could easily be extracted/reused: https://github.com/rapidsai/cudf/blob/branch-22.04/cpp/include/cudf/fixed_point/fixed_point.hpp We even had a recent blog that

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-03 Thread Keith Kraus
Libcudf / cuDF have supported 32-bit and 64-bit decimals for a few releases now (as well as 128-bit decimals in the past couple of releases) and they've generally been received positively from the community. Being able to roundtrip them through Arrow would definitely be nice as well! On Thu, Mar

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-03 Thread Micah Kornfield
I think this makes sense to add these. Typically when adding new types, we've waited on the official vote until there are two reference implementations demonstrating compatibility. On Thu, Mar 3, 2022 at 6:55 AM Antoine Pitrou wrote: > > Hello, > > Currently, the Arrow format specification

[Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-03 Thread Antoine Pitrou
Hello, Currently, the Arrow format specification restricts the bitwidth of decimal numbers to either 128 or 256 bits. However, there is interest in allowing other bitwidths, at least 32 and 64 bits for this proposal. A 64-bit (respectively 32-bit) decimal datatype would allow for