I see that there's a European variant of that event which seems more
adapted for at least some of the Arrow development community:
https://eu.communityovercode.org/
Le 04/10/2024 à 10:50, Raúl Cumplido a écrit :
Hi Jarek,
It seems really interesting, I won't be able to attend. Do you know
Hello,
Long ago, we added a ARROW_USE_PRECOMPILED_HEADERS to the Arrow C++
CMake options in the hope of speeding up builds by reducing C++ header
parsing time.
However, we later started to use a concurrent (*) solution added in
CMake itself: CMAKE_UNITY_BUILD, which merges batches of sourc
Hello Will, and thanks a lot for your involvement!
Le 01/10/2024 à 18:55, Dewey Dunnington a écrit :
On behalf of the Arrow PMC, I'm happy to announce that Will Wyd has
accepted an invitation to become a committer on Apache Arrow. Welcome,
and thank you for your contributions!
-dewey
Hi Kou,
That sounds fine to me.
Regards
Antoine.
Le 01/10/2024 à 03:55, Sutou Kouhei a écrit :
Hi,
The current decimal implementation omits the fractional part
if the fractional part is 0. For example: "0.E+1" not "0.0E+1"
Most environments such as Python, Node.js, PostgreSQL and
MySQL a
*they receive
Le 30/09/2024 à 11:57, Antoine Pitrou a écrit :
There might be a misunderstanding, but this is a report for the Apache
Software Foundation (they recent reports from hundreds of projects).
It's not really useful to copy our release notes there.
Regards
Antoine.
Le
There might be a misunderstanding, but this is a report for the Apache
Software Foundation (they recent reports from hundreds of projects).
It's not really useful to copy our release notes there.
Regards
Antoine.
Le 30/09/2024 à 11:46, Vibhatha Abeykoon a écrit :
Hi Andy,
Thanks for sha
se's (Databend, Doris, Druid,
DeepLake, Firebolt, Lance, Oxla, Pinot, QuestDB, SingleStore, etc.) native
at-rest partition file formats.
On Fri, 13 Sept 2024 at 16:43, Antoine Pitrou wrote:
Hello,
I'm perplexed by this discussion. If you want to send highly-compressed
files over
Hello,
I'm perplexed by this discussion. If you want to send highly-compressed
files over the network that is already possible: just send Parquet
files via HTTP(S) (or another protocol of choice).
Arrow Flight is simply a *streaming* protocol that allows
sending/requesting the Arrow format over
Hi,
I sympathize with the security argument. If no other library allows for
embedding the Azure password directly in the URL, then I would be ok for
deprecating it.
Regards
Antoine.
Le 10/09/2024 à 03:24, Sutou Kouhei a écrit :
Hi,
The current Azure file system URI accepts account key
Hi,
I don't have a specific opinion on this, but as a data point, this
already happens from time to time (though rarely).
Regards
Antoine.
Le 11/09/2024 à 17:32, Joris Van den Bossche a écrit :
Hi all,
This is a discussion specifically for the GitHub development workflow
we use in the m
+1 (binding).
Can you open a PR with the spec updates?
Regards
Antoine.
Le 04/09/2024 à 23:17, Matt Topol a écrit :
Based on various discussions among the ecosystem and to continue expanding
the zero-copy interoperability for Arrow to be used with different
libraries and databases (such as
Is there a way to ensure this is done automatically?
Regards
Antoine.
On Wed, 28 Aug 2024 10:05:45 +0900 (JST)
Sutou Kouhei wrote:
> Hi,
>
> How about indenting preprocessor directives for readability?
>
> Issue: https://github.com/apache/arrow/issues/43796
> PR: https://github.com/apache
+1 (binding)
Le 26/08/2024 à 04:37, Sutou Kouhei a écrit :
Hi,
I would like to propose splitting Go release process.
Motivation:
* We want to reduce needless major releases because major
releases require users' change
Approach:
1. Extract go/ in apache/arrow to apache/arrow-go like
a
Le 22/08/2024 à 17:08, Curt Hagenlocher a écrit :
(I also happen to want a canonical Arrow representation for variant data,
as this type occurs in many databases but doesn't have a great
representation today in ADBC results. That's why I filed [Format] Consider
adding an official variant type
u, Aug 22, 2024 at 3:51 PM Antoine Pitrou wrote:
Hi Gang,
Sorry, but can you give a pointer to the start of this discussion thread
in a readable format (for example a mailing-list archive)? It appears
that dev@arrow wasn't cc'ed from the start and that can make it
difficult to unde
Hi Gang,
Sorry, but can you give a pointer to the start of this discussion thread
in a readable format (for example a mailing-list archive)? It appears
that dev@arrow wasn't cc'ed from the start and that can make it
difficult to understand what this is about.
Regards
Antoine.
Le 22/08/2
Binding +1 (but posted one minor comment on the format PR).
Thank you Joel!
Regards
Antoine.
Le 05/08/2024 à 14:59, Joel Lubinitsky a écrit :
Hello Devs,
I would like to propose a new canonical extension type: Bool8
The prior mailing list discussion thread can be found at [1].
The format
I don't have any concrete data to test this against, but using 64-bit
offsets sounds like an obvious improvement to me.
Regards
Antoine.
Le 01/08/2024 à 13:05, Ruoxi Sun a écrit :
Hello everyone,
We've identified an issue with Acero's hash join/aggregation, which is
currently limited to
Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit :
If Canonical Extensions had existed at the time, I think there's a chance
we may have ended up with int32 Date as a first class type and int64
MillisecondDate as a Canonical Extension type.
Agreed.
Are there any lessons we've
learned from im
I can't
> find now that new types should be implemented as extension types if
> possible for these (and perhaps other) reasons.
>
>
> On Fri, Jul 19, 2024 at 5:39 AM Antoine Pitrou wrote:
> >
> >
> > Agreed with Felipe. This is meant for communicating with no
out any provisions on the
specification that might make this impossible.
-dewey
[1]
https://github.com/duckdb/duckdb/blob/85a82d86aa11a2695fc045deaf4f88fc63dd4fec/src/common/arrow/appender/bool_data.cpp#L28-L37
On Tue, Jul 16, 2024 at 11:25 AM Antoine Pitrou <
anto...@python.org>
Hi Kou,
Le 18/07/2024 à 11:33, Sutou Kouhei a écrit :
Here is my idea how to proceed this:
1. Extract go/ in apache/arrow to apache/arrow-go like
apache/arrow-rs
* Filter go/ related commits from apache/arrow and create
apache/arrow-go with them like we did for apache/arrow-rs
Hello,
Thanks all for this discussion. Given that there was no strong argument
against doing this, I decided to move forward and the change was made in
https://github.com/apache/arrow/pull/40875
Regards
Antoine.
On Wed, 5 Jun 2024 17:18:36 +0200
Antoine Pitrou wrote:
> Hello,
>
>
Hi Carl,
Le 08/07/2024 à 18:43, Carl Boettiger a écrit :
As an observer to both communities, I'm interested in if there is or might
be more communication between the Pangeo community's focus on Zarr
serialization with what the Arrow team has done with Parquet. I recognize
that these are diff
Hi Joel,
This looks good to me on the principle. Can you split the spec and the
implementation(s) into separate PRs?
Regards
Antoine.
Le 16/07/2024 à 13:18, Joel Lubinitsky a écrit :
Hi Arrow devs,
I'm working on adding an extension type for 8-bit booleans, and wanted to
start a discuss
[1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html
# --
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
https://keybase.io/octalene
On Monday, July 15th, 2024 at 07:59, Antoine Pitrou wrote:
No, because these marke
No, because these markers also communicate the information to other
implementations of S3 abstractions.
An example of this is: https://docs.cyberduck.io/protocols/s3/#folders
Regards
Antoine.
Le 13/07/2024 à 07:15, Aldrin a écrit :
...then I still expect the directory /foo to exist
Rig
Hi,
Le 12/07/2024 à 12:21, Hyunseok Seo a écrit :
*### Why Maintain Empty Directory Markers?*
From what I understand, object stores like S3 do not have a concept of
directories. The motivation behind maintaining these markers could be to
manage the object store as if it were a traditional fi
Hmmm, I strive to understand why a `(int32, utf8)` tuple for statistic
keys would be any simpler to implement than either `int32` *or* `utf8`
*or* `dictionary(int32, utf8)`.
Let's keep in mind that we would like to keep things simple for
consumers and producers of statistics.
We should al
Is this UDF implementation based on DataFusion? If so, it makes sense
for it to be part of the DataFusion project.
OTOH, if it can work with any data in the Arrow format, then it would
sound weird to maintain it in the DataFusion repo IMHO.
Regards
Antoine.
Le 28/06/2024 à 21:52, Andrew
I'll note that PyArrow also allows defining user-defined functions and
they are vectorized (the function arguments can be PyArrow arrays or
scalars, depending on the context in which a function is being executed):
https://arrow.apache.org/docs/python/compute.html#user-defined-functions
My vo
Le 12/06/2024 à 04:45, Sutou Kouhei a écrit :
It seems that we need to disable MI_OVERRIDE explicitly to
not define malloc() in libmimalloc.so:
https://github.com/microsoft/mimalloc/blob/03020fbf81541651e24289d2f7033a772a50f480/CMakeLists.txt#L10
Yes, that's what we do when building the bund
Sorry, I had forgotten to comment on this. I think this is generally a
good idea, but it would obviously need more eyes on it :-)
Can other people go and take a look at David's PR below?
Le 25/05/2024 à 04:47, David Li a écrit :
I've put up a draft PR here: https://github.com/apache/arrow/
Le 11/06/2024 à 10:35, Sutou Kouhei a écrit :
Hi,
In <2a32f61c-dd22-4f3f-bc98-822dcb6b0...@python.org>
"Re: [Discuss][C++] Switch to mimalloc by default?" on Tue, 11 Jun 2024
10:21:12 +0200,
Antoine Pitrou wrote:
I was thinking about find_package(). Good to know
Le 11/06/2024 à 10:01, Sutou Kouhei a écrit :
2. Is it OK that we add support for system mimalloc?
Hmm... that sounds legitimate, but with the caveat that a system
mimalloc can override the standard malloc/free functions. Would that
affect an application using Arrow C++?
Are you saying th
Hi Kou,
Le 09/06/2024 à 09:16, Sutou Kouhei a écrit :
Questions:
1. Do we need to keep jemalloc support? Compatibility? Can we
drop support for jemalloc to decrease maintenance cost?
I'm not sure there's much maintenance cost. I expect some people might
prefer jemalloc, and perhaps it
Le 09/06/2024 à 08:33, Sutou Kouhei a écrit :
Fields:
| Name | Type | Comments |
||---| |
| column | utf8 | (2) |
| key| utf8 not null | (3) |
1. Should the key be
Le 09/06/2024 à 09:01, Sutou Kouhei a écrit :
Hi,
One thing that a plain integer makes more difficult is representing
non-standard statistics. For example some engine might want to expose
elaborate quantile-based statistics even if it not officially defined
here. With a `utf8` or `dictionary(
Le 07/06/2024 à 18:30, Felipe Oliveira Carvalho a écrit :
On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote:
Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by
Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by both producers and
consumers (i.e. standardized).
The statistics array(s) could be a
map<
// the column index or n
Hi Kou,
Thanks for pushing for this!
Le 06/06/2024 à 11:27, Sutou Kouhei a écrit :
4. Standardize Apache Arrow schema for statistics and
transmit statistics via separated API call that uses the
C data interface
[...]
I think that 4. is the best approach in these candidates.
I agr
Hello,
Arrow C++ features a MemoryPool abstraction that allows using different
allocators interchangeably. Several MemoryPool implementations are
provided with Arrow C++ (though one can also build their own):
- a jemalloc-based implementation, currently the default on Linux
- a mimalloc-bas
(Gang Wu, Antoine Pitrou, Wes McKinney)
9x +1 non-binding (Micah Kornfield, Felipe Oliveira Carvalho, Fokko
Driesprong, Alenka Frim, Andy Grove, Raúl Cumplido, Sutou Kouhei, Jiashen
Zhang, Rok Mihevc)
Arrow:
6x +1 binding (Micah Kornfield, Antoine Pitrou, Andy Grove, Raúl Cumplido,
Wes McKinney
Hi Li!
Sorry for the delay.
It seems the problem lies here:
https://github.com/apache/arrow/blob/9f5899019d23b2b1eae2fedb9f6be8827885d843/cpp/src/arrow/filesystem/s3fs.cc#L1858
The Future is marked finished with the ObjectOutputStream's mutex taken,
and the Future's callback then triggers a c
+1 (binding).
Thanks for taking this up, Rok!
Regards
Antoine.
Le 29/05/2024 à 16:14, Rok Mihevc a écrit :
# sending this to both dev@arrow and dev@parquet
Hi all,
Following the ML discussion [1] I would like to propose a vote for
parquet-cpp issues to be moved from Parquet Jira [2] to Arr
Is it somehow possible to be a "member" of this account to indicate that
we have PMC status, or is that not possible within the LinkedIn
membership/permissions model?
Le 24/05/2024 à 18:04, Ian Cook a écrit :
Following the discussion [1] earlier this year about the status of the
Apache Ar
> 2. We'll provide pre-defined keys such as "max", "min",
> > >"byte_width" and "distinct_count" but users can also use
> > >application specific keys.
> > > 3. If true, then the value is approximate or best-effort.
Le 23/05/2024 à 16:09, Felipe Oliveira Carvalho a écrit :
Protocols that produce/consume statistics might want to use the C Data
Interface as a primitive for passing Arrow arrays of statistics.
This is also my opinion.
I think what we are slowly converging on is the need for a spec to
desc
Hi Kou,
I agree that Dewey that this is overstretching the capabilities of the C
Data Interface. In particular, stuffing a pointer as metadata value and
decreeing it immortal doesn't sound like a good design decision.
Why not simply pass the statistics ArrowArray separately in your
produce
I think these flags should be advisory and consumers should be free to
ignore them. However, some consumers apparently would benefit from them
to more faithfully represent the producer's intention.
For example, in Arrow C++, we could perhaps have a ImportDatum function
whose actual return t
+1 (binding)
Le 19/04/2024 à 22:22, Rok Mihevc a écrit :
Hi all,
Following initial requests [1][2] and recent tangential ML discussion [3] I
would like to propose a vote to add language for UUID canonical extension
type to CanonicalExtensions.rst as in PR [4] and written below.
A draft C++ and
+1 (binding) for the current proposal, i.e. with the RFC 8289
requirement and the 3 current String types allowed.
Regards
Antoine.
Le 30/04/2024 à 19:26, Rok Mihevc a écrit :
Hi all, thanks for the votes and comments so far.
I've amended [1] the proposed language with the RFC-8259 requiremen
mes, and so we could use this in that context).
I think that I would still prefer a canonical extension type (with storage
type null) over a new dedicated type.
On Wed, Apr 17, 2024 at 5:39 AM Antoine Pitrou wrote:
Ah! Well, I think this could be an interesting proposal, but someone
should
Ah! Well, I think this could be an interesting proposal, but someone
should put a more formal proposal, perhaps as a draft PR.
Regards
Antoine.
Le 17/04/2024 à 11:57, David Li a écrit :
For an unsupported/other extension type.
On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
What
Out of curiosity, did you notice this by chance or do you have some kind
of script that processes ASF mailing-list archives for possible voting
irregularities?
Regards
Antoine.
Le 17/04/2024 à 10:44, Christofer Dutz a écrit :
When looking at whimsy, I can’t see any person named Sutou Kou
eation of one-off nominal types for
very specific use-cases?
—
Felipe
On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou wrote:
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
A
:06 Antoine Pitrou wrote:
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
Antoine.
Le 10/04/2024 à 22:55, Wes McKinney a écrit :
In the past we have discussed adding a
Yes, JSON and UUID are obvious candidates for new canonical extension
types. XML also comes to mind, but I'm not sure there's much of a use
case for it.
Regards
Antoine.
Le 10/04/2024 à 22:55, Wes McKinney a écrit :
In the past we have discussed adding a canonical type for UUID and JSON.
Hello John,
Arrow IPC files can be backed quite naturally by shared memory, simply
by memory-mapping them for reading. So if you have some pieces of shared
memory containing Arrow IPC files, and they are reachable using a
filesystem mount point, you're pretty much done.
You can see an exam
It seems that perhaps this discussion should be rebooted for each
individual component, one at a time?
Let's start with something simple and obvious, with some frequent
contribution activity, such as perhaps Go?
Le 09/04/2024 à 14:27, Joris Van den Bossche a écrit :
I am also in favor o
Le 28/03/2024 à 21:42, Jacob Wujciak a écrit :
For Arrow C++ bindings like Arrow R and PyArrow having distinct versions
would require additional work to both enable the use of different versions
and ensure version compatibility is monitored and potentially updated if
needed.
We could simply
Thanks. The Arrow spec does support multiple union members with the same
type, but not all implementations do. The C++ implementation should
support it, though to my surprise we do not seem to have any tests for it.
If the Java implementation doesn't, then you can probably open an issue
for
Can you explain what ADT means ?
Le 02/04/2024 à 11:31, Finn Völkel a écrit :
Hi,
my question primarily concerns the union layout described at
https://arrow.apache.org/docs/format/Columnar.html#union-layout
There are two ways to use unions:
- polymorphic vectors (world 1)
- ADT st
Regardless of whether they have different compression ratios, it doesn't
explain why you would want a different compression *algorithm* altogether.
The choice of a compression algorithm should basically be driven by two
concerns: the acceptable space/time tradeoff (do you want to minimize
d
Hello Andrei,
Le 23/03/2024 à 13:23, Andrei Lazăr a écrit :
At this very moment, specifying different compression algorithms per column
is supported and in my use case it is extremely helpful, as I have some
columns (mostly containing floats), for which a compression algorithm like
Snappy (or
Also, with ADBC driver implementations currently in flux (none of them
has reached the "stable" status in
https://arrow.apache.org/adbc/main/driver/status.html), it might be a
disservice to users to implicitly fetch drivers from potentially
outdated DLLs on the current system.
Regards
Ant
Congratulations Bryce, and keep up the good work!
Regards
Antoine.
Le 18/03/2024 à 03:21, Nic Crane a écrit :
On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has
accepted an invitation to become a committer on Apache Arrow. Welcome, and
thank you for your contributions!
N
I didn't run the release script but I'm +1 on this (binding).
Regards
Antoine.
Le 04/03/2024 à 10:05, Raúl Cumplido a écrit :
Hi,
I would like to propose the following release candidate (RC0) of Apache
Arrow version 15.0.1. This is a release consisting of 37
resolved GitHub issues[1].
Thi
et me know as I want as many
parties in the community as possible to be part of this.
Thanks everyone.
--Matt
On Tue, Feb 27, 2024 at 12:48 PM Antoine Pitrou wrote:
Hello,
I'd really like to see more engagement and criticism from non-Voltron
Data parties before this is formally adop
Hello,
I'd really like to see more engagement and criticism from non-Voltron
Data parties before this is formally adopted as an Arrow spec.
Regards
Antoine.
Le 27/02/2024 à 18:35, Matt Topol a écrit :
Hey all,
I'd like to propose a vote for us to officially adopt the protocol
described
agenda for today's bi-weekly call.
Thanks,
Raúl
El mar, 13 feb 2024 a las 23:20, Antoine Pitrou () escribió:
Well, https://github.com/apache/arrow/issues/20379 makes me wonder if
anyone is using the Java Dataset bridge seriously.
Le 13/02/2024 à 21:10, Dane Pitkin a écrit :
Hi all,
Arrow
Well, https://github.com/apache/arrow/issues/20379 makes me wonder if
anyone is using the Java Dataset bridge seriously.
Le 13/02/2024 à 21:10, Dane Pitkin a écrit :
Hi all,
Arrow Java identified an issue[1] in the 15.0.0 release. There is an
undefined symbol in the dataset module that cau
l service as a fallback.
Are these the intended semantics? If so, is there a way to include the
original service in the list of locations without the implied precedence?
Thanks,
Joel
On Mon, Feb 12, 2024 at 11:52 James Duong
.invalid>
wrote:
This seems like a good idea, and also improves consist
Hi Dewey,
Le 12/02/2024 à 15:01, Dewey Dunnington a écrit :
Apache Arrow nanoarrow is a small C library for building and
interpreting Arrow C Data interface structures with bindings for users
of the R programming language.
Do you want to reconsider this sentence? It seems nanoarrow is starti
Hello,
This looks fine to me.
Regards
Antoine.
Le 12/02/2024 à 14:46, David Li a écrit :
Hello,
I'd like to propose a slight update to Flight RPC to make Flight SQL work
better in different deployment scenarios. Comments on the doc would be
appreciated:
https://docs.google.com/documen
I think we should find a proper descriptive name for the
"high-performance protocol", because "high-performance" is vague and
context-dependent, and also spreads unnecessary confusion about existing
alternatives such as regular Arrow IPC.
I would for example propose "Dissociated Arrow IPC"
My 2 cents : I don't understand what an open source project gains by
publishing on a microblogging platform.
As for Twitter specifically, its recent governance changes would be good
reason for terminating the @ApacheArrow account, IMHO.
Regards
Antoine.
Le 27/01/2024 à 23:06, Bryce Mecu
Hello,
My own answers:
1) isDelta should be true only when a delta is being transmitted (to be
appended to the existing dictionary with the same id); it should be
false when a full dictionary is being transmitted (to replace the
existing dictionary with the same id, if any)
2) yes, it coul
Impressive, thank you!
Le 23/01/2024 à 14:06, Andrew Lamb a écrit :
If anyone is interested, here is a new blog post about the last 6 months in
DataFusion[1] and where we are heading this year.
Andrew
[1]: https://arrow.apache.org/blog/2024/01/19/datafusion-34.0.0/
Well, if the main objective is to just follow the ASF Release
guidelines, then our verification process can be simplified drastically.
The ASF indeed just requires:
"""
Every ASF release MUST contain one or more source packages, which MUST
be sufficient for a user to build and test the relea
Go verification fails on Ubuntu 22.04:
```
# google.golang.org/grpc
../../gopath/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:2096:14:
undefined: atomic.Int64
note: module requires Go 1.19
# github.com/apache/arrow/go/v15/arrow/avro
arrow/avro/reader_types.go:594:16: undefined: fmt.Append
Hi,
For now, I would suggest that each implementation decides on their own
strategy, because we don't have a clear idea of which is better (and
extension types are probably not getting a lot of use yet).
Regards
Antoine.
Le 13/12/2023 à 17:39, Benjamin Kietzman a écrit :
The main proble
Hi Curt,
Yes, it's a problem in the Java implementation of these tests. Ideally
this should be fixed, but doing so would require some amount of scaffolding.
Regards
Antoine.
Le 09/12/2023 à 21:47, Curt Hagenlocher a écrit :
I've (mostly) fixed the C# implementation of dictionary IPC but
+1 (binding)
Le 08/12/2023 à 20:42, David Li a écrit :
Let's start a formal vote just so we're on the same page now that we've
discussed a few things.
I would like to propose we remove 'experimental' from Flight SQL and make it
stable:
- Remove the 'experimental' option from the Protobuf de
Hi,
While this looks like a nice start, I would expect more precise
recommendations for writing non-trivial services. Especially, one
question is how to send both an application-specific POST request and an
Arrow stream, or an application-specific GET response and an Arrow
stream. This migh
Given that MCJIT is deprecated and there doesn't seem to be a downside
to the new APIs, migrating to ORC v2 sounds fine to me.
Just a question: does it raise the minimum supported LLVM version?
Regards
Antoine.
Le 05/12/2023 à 03:35, Yue Ni a écrit :
Hi there,
I'd like to initiate a dis
For the sake of clarity, it seems this is talking about the Conference
on Innovative Data Systems Research:
https://www.cidrdb.org/cidr2024/
Regards
Antoine.
Le 06/12/2023 à 01:15, Wes McKinney a écrit :
I will also be there.
On Mon, Dec 4, 2023 at 12:58 PM Tony Wang wrote:
I am
Get
Hello,
Le 21/11/2023 à 22:59, Chris Thomas a écrit :
I apologize if this is not the appropriate venue for this request; if
that's the case, please let me know where I should be asking:
Earlier this month Dependabot flagged a security vulnerability with PyArrow
which prompted us to do an upgr
I also agree that an informal spec "how to efficiently transfer Arrow
data over HTTP" makes sense.
Probably with several aspects:
- one-shot GET data
- streaming GET
- one-shot PUT or POST
- streaming POST
- non-Arrow prologue and epilogue (for example JSON-based metadata)
- conventions for w
Welcome Raul, we're glad to have you!
Regards
Antoine.
Le 13/11/2023 à 20:27, Andrew Lamb a écrit :
The Project Management Committee (PMC) for Apache Arrow has invited
Raúl Cumplido to become a PMC member and we are pleased to announce
that Raúl Cumplido has accepted.
Please join me in c
ormat/CanonicalExtensions.html
On Thu, Nov 9, 2023, at 11:56, Antoine Pitrou wrote:
Or they could trivially use a int64 column for that, since the scale is
fixed anyway, and you're probably not going to multiply money values
together.
Le 09/11/2023 à 17:54, Curt Hagenlocher a écrit :
If Arrow had a deci
Nov 9, 2023, at 11:56, Antoine Pitrou wrote:
Or they could trivially use a int64 column for that, since the scale is
fixed anyway, and you're probably not going to multiply money values
together.
Le 09/11/2023 à 17:54, Curt Hagenlocher a écrit :
If Arrow had a decimal64 type, someone could ch
money column knowing that there are edge cases where they may
get an undesired result.
On Thu, Nov 9, 2023 at 8:42 AM Antoine Pitrou wrote:
Le 09/11/2023 à 17:23, Curt Hagenlocher a écrit :
Or more succinctly,
"111,111,111,111,111." will fit into a decimal64; would you prevent
Le 09/11/2023 à 17:23, Curt Hagenlocher a écrit :
Or more succinctly,
"111,111,111,111,111." will fit into a decimal64; would you prevent it
from being stored in one so that you can describe the column as
"decimal(18, 4)"?
That's what we do for other decimal types, see PyArrow below:
```
For the record, the correct PR link seems to be
https://github.com/apache/arrow/pull/38385
Le 08/11/2023 à 21:49, David Li a écrit :
Hello,
Joel Lubi has proposed adding bulk ingestion support to Arrow Flight SQL [1].
This provides a path for uploading an Arrow dataset to a Flight SQL ser
Severity: critical
Affected versions:
- PyArrow 0.14.0 through 14.0.0
- PyArrow 0.14.0 through 14.0.0
Description:
Deserialization of untrusted data in IPC and Parquet readers in PyArrow
versions 0.14.0 to 14.0.0 allows arbitrary code execution. An application is
vulnerable if it reads Arrow
Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit :
Is this buffer lengths buffer only present if the array type is Utf8View?
IIUC, the proposal would add the buffer lengths buffer for all types if the
schema's
flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to avoid
the specia
Le 26/10/2023 à 18:59, Dewey Dunnington a écrit :
That sounds a bit hackish to me.
Including only *some* buffer sizes in array->buffers[array->n_buffers]
special-cased for only two types (or altering the number of buffers
required by the IPC format vs. the number of buffers required by the
Le 26/10/2023 à 17:45, Dewey Dunnington a écrit :
The lack of buffer sizes is something that has come up for me a few
times working with nanoarrow (which dedicates a significant amount of
code to calculating buffer sizes, which it uses to do validation and
more efficient copying).
By the wa
Le 26/10/2023 à 17:45, Dewey Dunnington a écrit :
> A potential alternative might be to allow any ArrowArray to declare
> its buffer sizes in array->buffers[array->n_buffers], perhaps with a
> new flag in schema->flags to advertise that capability.
That sounds a bit hackish to me.
I'd rather l
1 - 100 of 1007 matches
Mail list logo