Le 21/02/2021 à 01:05, Wes McKinney a écrit :
> I agree that we should avoid leaking uninitialized memory in places
> where we have control over it. I could imagine a third party project
> having UBSAN warnings and then tracing the origin of them to something
> in Arrow that they then have to wor
I agree that we should avoid leaking uninitialized memory in places
where we have control over it. I could imagine a third party project
having UBSAN warnings and then tracing the origin of them to something
in Arrow that they then have to work around. As for the potential
performance implications,
Hi Ben and Ben,
I think it would be good to have a convention for by default filling null
slots in arrays with known value. I think it might be a mistake to use
zero as the value because it can lead to reliance on this behavior. Secure
by default is a good approach to take.
For kernels in partic
I am definitely in the camp that we should not leak past data through
uninitialized Arrow memory (for example by transmitting such buffers
using Arrow IPC).
Regards
Antoine.
Le 20/02/2021 à 21:17, Benjamin Kietzman a écrit :
> Original discussion at
> https://github.com/apache/arrow/pull/9471
I agree.
Below are two notes from a similar discussion on the Rust implementation:
1. In SIMD, for performance reasons, operations are performed over the
whole buffer irrespectively of the bitmap mask, and deal with the bitmap
mask separately. If a slot contains an arbitrary value, the operation
Original discussion at
https://github.com/apache/arrow/pull/9471#issuecomment-779944257 (PR for
https://issues.apache.org/jira/browse/ARROW-11595 )
Although the format does not specify what is contained in array slots
masked by null bits (for example the first byte in the data buffer of an
int8 ar
I understand that GROUP BY ought not imply any particular ordering; it's just
that working with other SQL databases, I've come to expect that ordering will
be consistent between multiple runs of the same statement, at least within the
context of a single transaction on a single connection.
I
Great! I will start experimenting and see how far I get.
While we're at it, should we consider putting something in the metadata field?
That would be more involved due to the bespoke format of the property, but it
might be a good time to consider any additional information that could be
usefu
The SQL standard in general makes no guarantee of the order of resulting
data unless there is an explicit ORDER BY clause.
I would guess that there are two factors in play here:
1. The use of hash-based data structures, as you mention
2. If you have partitioned data then it is processed on multip
When I group by a column in DataFusion SQL, the order of the results is
different every time. For example, "select country from data group by country"
against
https://github.com/Teradata/kylo/blob/master/samples/sample-data/csv/userdata3.csv
might return "Moldova" first one time, and then "Swed
Thanks Davd. I have filed https://issues.apache.org/jira/browse/ARROW-11717
to track
On Fri, Feb 19, 2021 at 5:12 PM David Li wrote:
> @mrkn submitted a PR to add backtraces which was merged recently:
> https://github.com/apache/arrow/pull/9524
>
> However I think the abort is a red herring - th
Arrow Build Report for Job nightly-2021-02-20-0
All tasks:
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-20-0
Failed Tasks:
- conda-linux-gcc-py39-aarch64:
URL:
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-20-0-drone-conda-linux
12 matches
Mail list logo