On Tue, 1 Oct 2019 at 21:03, Maarten Ballintijn wrote:
>
> I ran cProfile to understand better what is going on in Pandas. Using your
> code below I find that
> Pandas runs a loop over generic the datetime64 conversion in case the
> datetime64 is not in ’ns’.
> The conversion unpacks the time
Some answers to the other questions:
On Sat, 28 Sep 2019 at 22:16, Maarten Ballintijn wrote:
> ...
> This leaves me with the following questions:
>
> - Who should I talk to to get this resolved in Pandas?
>
> You can open an issue on their tracker:
https://github.com/pandas-dev/pandas/issues/
On Sat, Sep 28, 2019 at 3:16 PM Maarten Ballintijn wrote:
>
> Hi Joris,
>
> Thanks for your detailed analysis!
>
> We can leave the impact of the large DateTimeIndex on the performance for
> another time.
> (Notes: my laptop has sufficient memory to support it, no error is thrown, the
>
Hi Joris,
Thanks for your detailed analysis!
We can leave the impact of the large DateTimeIndex on the performance for
another time.
(Notes: my laptop has sufficient memory to support it, no error is thrown, the
resulting DateTimeIndex from the expression is identical to your version or the
Hi Maarten,
Thanks for the reproducible script. I ran it on my laptop on pyarrow
master, and not seeing the difference between both datetime indexes:
Versions:
Python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019,
23:01:00)
[GCC 7.3.0] on linux
numpy:1.16.4
pandas:
Hi,
The code to show the performance issue with DateTimeIndex is at:
https://gist.github.com/maartenb/256556bcd6d7c7636d400f3b464db18c
It shows three case 0) int index, 1) datetime index, 2) date time index created
in a slightly roundabout way
I’m a little confused by the two
hi
On Tue, Sep 24, 2019 at 9:26 AM Maarten Ballintijn wrote:
>
> Hi Wes,
>
> Thanks for your quick response.
>
> Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and:
>
> numpy: 1.16.5
> pandas: 0.25.1
> pyarrow: 0.14.1
>
> It looks like 0.15 is close, so
Hi Wes,
Thanks for your quick response.
Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and:
numpy: 1.16.5
pandas: 0.25.1
pyarrow: 0.14.1
It looks like 0.15 is close, so I can wait for that.
Theoretically I see three components driving the
hi Maarten,
Are you using the master branch or 0.14.1? There are a number of
performance regressions in 0.14.0/0.14.1 that are addressed in the
master branch, to appear as 0.15.0 relatively soon.
As a file format, Parquet (and columnar formats in general) is not
known to perform well with more