Hi,
I add dev@ because this may need to improve Apache Arrow C++.
It seems that we need the following new feature for this use
case (combining chunks with small memory to process large
data with pandas, mmap and small memory):
* Writing chunks in arrow::Table as one large
I think it would make more sense to use Arrow for nested types in Ibis
-- I'm biased for having been heavily involved in both projects, but
NumPy doesn't have a very good story for nested data and so if
possible it would better to prevent technical debt from accumulating
from decisions made years
Hello Arrow and Ibis devs,
I notice that Arrow's to_pandas method produces different types than is
expected in the Ibis test suite.
-
Lists are returned as numpy arrays in Arrow, but expected to be Python
list objects in Ibis.
-
NULL values in integer columns are converted to
I agree with Jacques here. Perhaps what is needed is an unsafe
non-nullable array accessor layer, then there is no need for flags
etc. We've already been writing a lot of such code in C++ (splitting
between no-nulls and some-nulls paths, see also the BitBlockCounter
stuff we've been doing, is such
This change is undesirable as it optimizes one path and makes several
others behave in unintended ways. What happens if a vector with nulls
shows up? What happens if a user sets a position to a null value in user
code when this flag set?
If the answer to the above questions is the use is an
Arrow Build Report for Job nightly-2020-09-10-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-10-0
Failed Tasks:
- debian-buster-amd64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-10-0-github-debian-buster-amd64
-
Hi Micah,
Thank you for your reply and the links, the threads were quite interesting.
You are right, I opened the flink issue regarding arrow support to
understand whether it was on their roadmap to take a look at.
My use-case is processing a stream of events (or rows if you will) to
compute
+1 on this also.
As per previous questions, this is something I am also looking into.
IIOT realtime streaming, it can be as low as one datapoint per 'message' /
block / packet etc.Or at best. one 'row'. i.e. 1 second streaming sensor
data, or faster which also has a 1 second latency /