Re: [DISCUSS][C++] Can we use "0E+1" not "0.E+1" for deciaml for broader compatibility?

2024-10-01 Thread Jacek Pliszka
Hi! I am a bit puzzled why it is not 0e0. This seems to me the "natural" way 0 should be written in scientific notation. Best Regards, Jacek Pliszka > Le 01/10/2024 à 03:55, Sutou Kouhei a écrit : > > Hi, > > > > The current decimal implementation omits

Re: Arrow 15 parquet nanosecond change

2024-02-21 Thread Jacek Pliszka
Hi! pq.write_table( table, config.output_filename, coerce_timestamps="us", allow_truncated_timestamps=True, ) allows you to write as us instead of ns. BR J śr., 21 lut 2024 o 21:44 Li Jin napisał(a): > Hi, > > My colleague has informed

Re: Is there anyway to resize record batches

2023-11-22 Thread Jacek Pliszka
cpp/compute.html#selections > > > > # -- > > # Aldrin > > > https://github.com/drin/ > > https://gitlab.com/octalene > > https://keybase.io/octalene > > > On Wednesday, November 22nd, 2023 at 10:58, Jacek Pliszka < > ja

Re: Is there anyway to resize record batches

2023-11-22 Thread Jacek Pliszka
Hi! I think some code is needed for clarity. You can concatenate tables (and combine_chunks afterwards) or arrays. Then pass such concatenated one. Regards, Jacek śr., 22 lis 2023 o 19:54 Lee, David (PAG) napisał(a): > I've got 36 million rows of data which ends up as a record batch with 300

Re: Apache Arrow file format

2023-10-19 Thread Jacek Pliszka
There is a note there explaining what they understand by it but further down the line they do not make such distinction. The fact that parquet can be better in-memory format than arrow for certain common uses is something I haven't thought of and is eye-opening for me, admittedly so because I am n

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-03 Thread Jacek Pliszka
the type. > > Do you want to start a new thread or open up some JIRAs to track this work? > > Thanks, > Micah > > On Mon, May 3, 2021 at 5:32 AM Jacek Pliszka > wrote: > > > Sorry, my mistake. > > > > You are right - I meant anchored intervals as in pand

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-03 Thread Jacek Pliszka
my limited understanding, that is more > > a concern of functions using interval types rather than the type itself. > > For instance a quick search of postgres [1] docs only talks about half-open > > in relation to the "Overlaps" operator > > > > Thank

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-02 Thread Jacek Pliszka
Hi! I wonder if it were possible to have generic interval with integers of specified size just to have common base for interval arithmetic. Then user can convert their period to ordinals and use the arithmetic (joining, deoverlapping, common parts, explosion etc.). So YEAR_MONTH and DAY_TIME wou

Re: Decimal128 scale limits

2020-07-01 Thread Jacek Pliszka
Hi! I am aware about at least 2 different decimal128 things: a) the one we have - where we use 128 bits to store integer which is later shifted by scale - 38 is number of digits of significand i.e. digits fitting in 128 bits (2**128/10**38) - IMHO it is completely unrelated to scale which we sto

Re: [Python] black vs. autopep8

2020-04-02 Thread Jacek Pliszka
Hi! I believe amount of changes is not that important. In my opinion, what matters is which format will allow reviewers to be more efficient. The committer can always reformat as they like. It is harder for the reviewer. BR, Jacek czw., 2 kwi 2020 o 15:32 Antoine Pitrou napisał(a): > > > PS:

Re: Using GitHub Actions to automate style and other fixes

2020-02-20 Thread Jacek Pliszka
As a beginner contributor I believe I can vote for linting as part of the build. For me the best would be BEGINNER/ALL_CHECKS option in the Makefile that does all the linting and all checks done in the build. And in the instruction it would be clearly suggested to use it. BR, Jacek śr., 19 lu

Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
bandwidth/disk usage by factor of 4. What would be the best approach to such use case? Would decimal_scale CastOption be OK or should it rather be compute 'multiply' kernel ? BR, Jacek śr., 12 lut 2020 o 19:32 Jacek Pliszka napisał(a): > > OK, then what I proposed does not make

Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
OK, then what I proposed does not make sense and I can just copy the solution you pointed out. Thank you, Jacek śr., 12 lut 2020 o 19:27 Wes McKinney napisał(a): > > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka > wrote: > > > > Hi! > > > > ARROW-3329 - w

[ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Hi! ARROW-3329 - we can discuss there. > It seems like it makes sense to implement both lossless safe casts > (when all zeros after the decimal point) and lossy casts (fractional > part discarded) from decimal to integer, do I have that right? Yes, though if I understood your examples are the sa

Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Hi! I am interested in having cast from Decimal to Int in pyarrow. I have couple ideas but I am a newbie so I might be wrong: Do I understand correctly that the problem lies in the fact that CastFunctor knows nothing about decimal scale? Were there any ideas how to handle this properly? My ide

Re: Help with decimal ArrayData needed

2020-02-11 Thread Jacek Pliszka
Issue solved, sorry for noise - is_valid was false for 2nd value. BR. Jacek wt., 11 lut 2020 o 21:43 Jacek Pliszka napisał(a): > Hi! > > I am an arrow newbie trying to implement cast from Decimal128 to Int64 and > I need some help. > > In unit test I am passing &g

Help with decimal ArrayData needed

2020-02-11 Thread Jacek Pliszka
Hi! I am an arrow newbie trying to implement cast from Decimal128 to Int64 and I need some help. In unit test I am passing std::vector v to CheckCase(decimal(38, 10), v, is_valid, int64(), e, options); But when I try to read it in CastFunctor with auto ptr = input.GetValues(1) where in_type is D