Re: [Python/C-Glib] writing IPC file format column-by-column

2020-09-10 Thread Sutou Kouhei
Hi, I add dev@ because this may need to improve Apache Arrow C++. It seems that we need the following new feature for this use case (combining chunks with small memory to process large data with pandas, mmap and small memory): * Writing chunks in arrow::Table as one large

Re: Aligning intended target types for lists and structs when converting to pandas DataFrame

2020-09-10 Thread Wes McKinney
I think it would make more sense to use Arrow for nested types in Ibis -- I'm biased for having been heavily involved in both projects, but NumPy doesn't have a very good story for nested data and so if possible it would better to prevent technical debt from accumulating from decisions made years

Aligning intended target types for lists and structs when converting to pandas DataFrame

2020-09-10 Thread Tim Swast
Hello Arrow and Ibis devs, I notice that Arrow's to_pandas method produces different types than is expected in the Ibis test suite. - Lists are returned as numpy arrays in Arrow, but expected to be Python list objects in Ibis. - NULL values in integer columns are converted to

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-10 Thread Wes McKinney
I agree with Jacques here. Perhaps what is needed is an unsafe non-nullable array accessor layer, then there is no need for flags etc. We've already been writing a lot of such code in C++ (splitting between no-nulls and some-nulls paths, see also the BitBlockCounter stuff we've been doing, is such

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-10 Thread Jacques Nadeau
This change is undesirable as it optimizes one path and makes several others behave in unintended ways. What happens if a vector with nulls shows up? What happens if a user sets a position to a null value in user code when this flag set? If the answer to the above questions is the use is an

[NIGHTLY] Arrow Build Report for Job nightly-2020-09-10-0

2020-09-10 Thread Crossbow
Arrow Build Report for Job nightly-2020-09-10-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-10-0 Failed Tasks: - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-10-0-github-debian-buster-amd64 -

Re: Arrow as a streaming format

2020-09-10 Thread Pedro Silva
Hi Micah, Thank you for your reply and the links, the threads were quite interesting. You are right, I opened the flink issue regarding arrow support to understand whether it was on their roadmap to take a look at. My use-case is processing a stream of events (or rows if you will) to compute

Re: Arrow as a streaming format

2020-09-10 Thread Mark Farnan
+1 on this also. As per previous questions, this is something I am also looking into. IIOT realtime streaming, it can be as low as one datapoint per 'message' / block / packet etc.Or at best. one 'row'. i.e. 1 second streaming sensor data, or faster which also has a 1 second latency /