Le 19/08/2022 à 20:09, David Li a écrit :
Since it's been a while, I'd like to give an update. There are also a few
questions I have around distribution.
Currently:
- Supported in C, Java, and Python.
- For C/Python, there are basic drivers wrapping Flight SQL and SQLite, with a
draft of a l
Hello all,
The Arrow format has support for extension types, but there's no
official way to agree accross implementations on well-known extension types.
This issue has come up a couple times with people wanting to implement
support for types such as JSON or UUID in order to enable better
i
I've created a pull request introducing a canonical extension type as
discussed in this thread. https://github.com/apache/arrow/pull/13901
Thanks for all the input!
On Wed, Aug 3, 2022 at 10:46 AM Antoine Pitrou wrote:
Le 03/08/2022 à 16:19, Lee, David a écrit :
There are probably t
No opposition from me.
Regards
Antoine.
Le 17/08/2022 à 10:05, Sutou Kouhei a écrit :
Hi,
Can we drop support for Visual Studio 2017?
Visual Studio 2017 reached EOL at 2022-04-12:
https://docs.microsoft.com/en-us/lifecycle/products/visual-studio-2017
Listing| Start Date | Ma
Le 17/08/2022 à 16:52, Weston Pace a écrit :
Sorry for a "one more thing email" but I had one more thought
regarding R 3.6 support for Windows. I think those users should
continue to be able to use Arrow 10.0.0.
Any particular reason why this should be 10.0 and not 9.0 for example?
(is due t
Look for ARROW_SCOPED_TRACE
Le 17/08/2022 à 16:22, Yaron Gvili a écrit :
There are no sleeps nor deadlocks; it's just due to a large configuration-space
that I agree can be reduced by sampling. Could you explain how to use
SCOPED_TEST, or refer to documentation about it? I understand your id
Le 17/08/2022 à 10:48, Jacob Wujciak a écrit :
I am generally in favour of this proposal but would like to mention that we
have to be able to build on MacOS 10.13 for the R package due to CRAN using
it.
The CRAN builder comes with:
Apple LLVM version 10.0.0 (clang-1000.10.44.4); GNU Fortran (GC
For the record, https://github.com/apache/arrow/pull/13661 was finally
merged. It switches to -O2 by default and selectively re-enables
auto-vectorization on gcc.
Regards
Antoine.
Le 21/07/2022 à 17:11, Antoine Pitrou a écrit :
Le 21/07/2022 à 16:34, Wes McKinney a écrit :
Based on
Hello,
We are in 2022 and Arrow C++ still strives to be compatible with C++11.
Maintaining compatibility has caused us growing pains since third-party
libraries have begun requiring C++14 or later. Boost is warning that it
will soon require C++14
(https://issues.apache.org/jira/browse/ARROW
I would welcome trimming down our hand-written dependency bundling and
delegate most of the work to vcpkg or conan, but I don't know how usable
and flexible those alternatives are. Somehow more knowledgeable
(probably Kou or perhaps Krisztian?) should answer.
(also note that using an extern
nt at runtime?
Only if you do things that are alignment-sensitive.
That said, while it is formally allowed AFAIK, it probably occurs rarely
so potential issues (if any) are probably not surfaced.
Best regards
Antoine.
Best,
Jorge
On Tue, Aug 2, 2022 at 6:59 PM Antoine Pitrou wrote:
v3/__https://github.com/apache/parquet-format/blob/master/LogicalTypes.md*json__;Iw!!KSjYCgUGsB4!aTjWsSjJoE1gN7iM84QJUDoTt3F1A9BBpaLGscg9jYN26Eohr9bN8y0ccxgI8S3zLfGUjXBV2ewE9sNlK7dP$
On Mon, Aug 1, 2022 at 11:39 PM Antoine Pitrou wrote:
Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit :
Thanks f
I would hope conda get their act together and improve on this.
I have mixed feelings about complicating the documentation with
explanations of how mamba is (often? usually?) a better replacement to
conda. Generally we should focus on Arrow-specific issues and avoid
distracting the user with
Hi Jorge,
So there are two aspects to the answer:
- ideally, the C++ implementation also works on non-aligned data (though
this is poorly tested, if any)
- when mmap'ing a file, you should get a page-aligned address
As for int128 and int256, these usually don't exist at the hardware
level
Le 01/08/2022 à 19:13, Wes McKinney a écrit :
If we start placing restrictions on how the out-of-line string buffers
are managed and externalized, it risks undermining the zero-copy
interoperability benefits that we're trying to achieve with this.
But embedded pointers in turn undermine zero
Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit :
Thanks for all the great feedback.
To proceed forward, we seem to need decisions around the following:
1. Whether to use arrow extensions or first class types. The consensus is
building towards using arrow extensions.
+1
2. What do we do
Potentially extending the IPC format to support these additional
flexibilities is the easy part.
The difficult part is to shoehorn the newstanding flexibility into
existing APIs, also leaking into the expectations of downstream users.
For example, in C++ it is expected that a RecordBatchRea
Hi Wes,
Le 31/07/2022 à 00:02, Wes McKinney a écrit :
I understand there are still some aspects of this project that cause
some squeamishness (like having arbitrary memory addresses embedded
within array values whose lifetime a C ABI consumer may not know about
-- we already export memory add
Le 30/07/2022 à 01:02, Wes McKinney a écrit :
I think either path:
* Canonical extension type
* First-class type in the Type union in Flatbuffers
would be OK. The canonical extension type option is the preferable
path here, I think, because it allows Arrow implementations without
any special
This isn't great since library users may have
policies that disallow warnings.
On Fri., Jul. 22, 2022, 05:47 Antoine Pitrou, wrote:
We could perhaps suppress the integer downcast warnings, but only on
32-bit Windows (not 64-bit, not other platforms).
Regards
Antoine.
Le 22/07/2022 à 14:4
We could perhaps suppress the integer downcast warnings, but only on
32-bit Windows (not 64-bit, not other platforms).
Regards
Antoine.
Le 22/07/2022 à 14:42, Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) a écrit :
Hi James.
I don't have strong feelings about whose PR is used and how exactly th
+1 for disabling.
Le 21/07/2022 à 15:35, Raul Cumplido Dominguez a écrit :
Hi,
There was a discussion on Zulip dev about disabling dependabot alerts and
updates [1]
Based on this Apache INFRA wiki page we should be able to disable them [2].
There are currently several open PRs from dependa
Le 21/07/2022 à 16:34, Wes McKinney a écrit :
Based on the discussion in https://github.com/apache/arrow/pull/13661,
it seems that one major issue with switching to -O2 is that
auto-vectorization (which we rely on in places) and perhaps some other
optimization passes would have to be manually
Le 08/07/2022 à 15:19, Wes McKinney a écrit :
* I believe that having a Type::RLE is the right approach in C++ and
it makes dynamic dispatch everywhere in the library pretty
straightforward.
+1 on this, as it will raise a nice NotImplemented error for existing
code rather than crash or corr
Le 18/07/2022 à 03:54, Wes McKinney a écrit :
This patch caused Parquet files written with 2.0.0 to be unreadable in
3.0.0 onward
https://github.com/apache/arrow/commit/ef0feb2c9c959681d8a105cbadc1ae6580789e69
This was reported on June 14 on dev@ and I git-bisected to the root cause:
https:/
On Fri, 8 Jul 2022 09:49:28 -0600
Todd Farmer wrote:
>
> In summary, here are the actions I propose:
>
> 1. Establish a threshold at which assigned, idle issues should be
> unassigned and comment added.
> 2. Define that threshold to be 90 days.
> 3. Document the above as a project policy for iss
I don't think you need anything more on the PyArrow side, but you need
to (re)compile Arrow C++ with ARROW_COMPUTE enabled, is that the case?
Le 07/07/2022 à 22:16, Li Jin a écrit :
Hello,
I am trying to build Arrow/Pyarrow with our internal build system (cmake
based) and encounter and e
row/engine/substrait` module in Arrow C++ does this
too. If the technical approach I just described would actually expose the
classes, what would be a proper way to avoid exposing them? Perhaps the
classes should be generated into a private package, e.g., under
`python/_ep`? (ep stands for external pr
I agree that giving direct access to protobuf classes is not Arrow's
job. You can probably take the upstream (i.e. Substrait's) protobuf
definitions and compile them yourself, using whatever settings required
by your project.
Regards
Antoine.
Le 03/07/2022 à 21:16, Jeroen van Straten a é
a whole chain of scalar functions that all
write into preallocated memory can execute without having to touch
shared_ptrs or deal with other objects with excess microperformance
overhead) where such optimization can happen more easily.
On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou wrote:
Le 06/06/202
On Mon, 27 Jun 2022 12:46:40 +0200
Raul Cumplido Dominguez wrote:
> Hi,
>
> During the last months there has been some work going on in order to
> improve the visibility of our nightly builds, the failures, for how long
> have they been failing, etcetera.
>
> We started by adding some notificati
Welcome to our new committers!
Le 22/06/2022 à 20:02, Andrew Lamb a écrit :
Congratulations!
On Wed, Jun 22, 2022 at 1:27 PM Dragoș Moldovan-Grünfeld <
dragos.m...@gmail.com> wrote:
Congratulations!
Sent from my iPhone
On 22 Jun 2022, at 18:13, Neal Richardson
wrote:
On behalf of t
Can we name it miniarrow or nanoarrow? We don't want to convey the
message that there is a parallel C API for Arrow.
Le 15/06/2022 à 05:18, Dewey Dunnington a écrit :
Hi all,
I drafted a second PR [1] drafting a design for storing parsed information
obtained from a struct ArrowSchema (i.e.
Le 08/06/2022 à 20:55, Jorge Cardoso Leitão a écrit :
0 (binding) - imo there is some unclarity over what is expected to be
passed over the C streaming interface - an Array or a StructArray.
I think the spec claims the former, but the C++ implementation (which I
assume is the reference here) e
+1 (binding)
Le 08/06/2022 à 20:15, Will Jones a écrit :
Hi,
Given all feedback to discussion [1] has been positive, I would like to
propose marking the C Stream Interface as stable.
I have prepared PRs in apache/arrow [2] and apache/arrow-rs [3] to remove
all "experimental" markers from th
No, Arrow should definitely compile in 32 bits. Feel free to open a
JIRA and/or submit a PR for it.
Le 08/06/2022 à 19:48, Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) a écrit :
Hi Antoine,
I need the 32 bit support because our project needs to support 32 bit. These
are my constrains.
As of
Hi,
It is a conscious decision of following the Google C++ style guide:
https://google.github.io/styleguide/cppguide.html#Integer_Types
I agree that size_t (or ssize_t) would have been a better choice for
in-memory lengths and sizes. Unfortunately, that ship has sailed now.
32-bit systems a
+1 for removing it.
On Fri, 03 Jun 2022 08:32:35 +0900 (JST)
Sutou Kouhei wrote:
> Hi,
>
> We have Hive adapter in cpp/src/arrow/dbi/hiveserver2 but
> it's not maintained. Can we remove this?
>
> Reasons:
>
> 1. I got build errors when I build it on master by
>-DARROW_HIVESERVER2=ON. Se
Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
Wow that's a lot of progress!
Definitely agree on the scalar outputs point.
One point about the ArraySpan - why does it need to know its data type?
Once a kernel has been resolved by the registry, the kernel will only know
how to execute on the
Le 02/06/2022 à 00:02, Weston Pace a écrit :
I'd like to propose we add a second kernel function registry. There
doesn't need to be any user facing API change. We could probably use
an approach like [2] to proxy to the old function registry when the
newer registry doesn't contain the asked-f
Sorry, I put "C++" in the title but this really affects Java via JNI.
Le 01/06/2022 à 16:22, Antoine Pitrou a écrit :
Hello,
The topic came up recently of bumping up our minimal macOS requirements
from 10.11 to 10.13 (*). Do people have any particular concerns about this?
Hello,
The topic came up recently of bumping up our minimal macOS requirements
from 10.11 to 10.13 (*). Do people have any particular concerns about this?
(*) https://github.com/apache/arrow/pull/13157#issuecomment-1143670152
Regards
Antoine.
Le 31/05/2022 à 21:41, Micah Kornfield a écrit :
I'm currently working on adding Run-Length encoding to arrow.
Nice
What are the intended use cases for this:
- external engines want to provide run-length encoded data to work on
using arrow?
It is more than just external engines. Many p
Hi,
Le 31/05/2022 à 20:24, Tobias Zagorni a écrit :
Hi, I'm currently working on adding Run-Length encoding to arrow. I
created a function to dictionary-encode arrays here (currently only for
fixed length types):
https://github.com/apache/arrow/compare/master...zagto:rle?expand=1
The general
For the record, https://github.com/apache/arrow/pull/13115 was merged
with the proposed change.
Regards
Antoine.
On Fri, 13 May 2022 17:48:21 +0200
Antoine Pitrou wrote:
> I don't think this needs a vote, there is no functional change in the
> spec, it's just an addi
+1 as well
Le 25/05/2022 à 01:25, Weston Pace a écrit :
+1
I think opt-in is the right way to go here.
On Tue, May 24, 2022 at 12:40 PM Will Jones wrote:
Hello Arrow devs,
I've written a PR for the C++ S3FileSystem that adds an option
"allow_create_buckets" which when false will error if
That sounds fair to me as well.
Le 27/05/2022 à 13:01, Andrew Lamb a écrit :
+1 to the idea on making it a stable interface
On Thu, May 26, 2022 at 6:57 PM Jonathan Keane wrote:
I too am +1 (nonbinding) to marking it as stable
-Jon
On Thu, May 26, 2022 at 1:05 PM Neal Richardson <
nea
Hello Srinivas,
No, there is not. Also, the pyarrow.jvm is a bit limited currently,
there are plans to rewrite it to make it more general (you may want to
help contribute if you feel interested and skilled enough):
https://issues.apache.org/jira/browse/ARROW-14319
Regards
Antoine.
Le 2
gle-noded-ness).
Antoine did point out the ACE name is taken by a C++ library. The
"Ace"
name is also used by the javascript library [2], but I think is a
general
enough work that no single library has much specific claim to it.
Some other names I thought of:
Arrow Recurve
Ace A
That sounds ok to me, we should just ensure that commits are squashed
and rebased on top of the main/master branch.
(also, the commit title and description should inherit the PR's
corresponding fields)
Le 18/05/2022 à 05:43, Sutou Kouhei a écrit :
Hi,
How about using GitHub API instead
ll/13115
to add this for the C data/stream interfaces.
On Mon, May 9, 2022, at 15:42, Antoine Pitrou wrote:
Le 09/05/2022 à 20:28, Tomek Drabas a écrit :
I am new to this board so please, let me know if any of this doesn't make
sense.
I am building a FligthSQL example with DuckDB backend. D
Le 13/05/2022 à 16:30, Alessandro Molina a écrit :
I think Arrow should definitely consider adding a DataFrame-like API.
There are multiple reasons why exposing Arrow to end users instead of
restricting it to developers of framework would be beneficial for the Arrow
project itself.
A rough ap
Hi!
Can you elaborate how the binding transfers data between Datafusion and
Java Arrow? If I'm reading the code correctly, it seems to be writing
an IPC stream?
Le 11/05/2022 à 11:20, Jiayu Liu a écrit :
Hi dev@arrow,
Recently I've created and published a Java binding[1] to datafusion[2
Le 11/05/2022 à 10:19, Alessandro Molina a écrit :
As far as I understood, the idea is not to fully remove memory mapping,
just turn the current mmap=True default arguments to mmap=False
The goal is mostly to provide consistent behaviour for end users. At the
moment users might face very diff
Le 10/05/2022 à 19:16, Antoine Pitrou a écrit :
That said, tests which require should be skipped gracefully instead of
failing.
Oops... some words got swallowed:
tests which require *the dataset module* should be skipped gracefully
instead of failing.
Le 10/05/2022 à 19:13, Weston
That said, tests which require should be skipped gracefully instead of
failing.
Le 10/05/2022 à 19:13, Weston Pace a écrit :
I think you need to add:
export PYARROW_WITH_DATASET=1
On Tue, May 10, 2022 at 7:07 AM Yaron Gvili wrote:
Hello,
I ran into a problem with running PyArrow
Le 10/05/2022 à 13:27, Raul Cumplido a écrit :
I still think there is some value in standardising the "feature freeze" on
new release candidates once a first release candidate has been created and
only add required fixes for the follow up RCs. What I would like to avoid
with that is rushing bi
AQE as
Adaptive
Query Execution (especially Spark users).
"Arrow Compute Engine" in full doesn't sound bad perhaps?
With DataFusion, I made a list of words related to the project (data,
query, compute, engine, etc) and then a list of completely unrelated
words
and then looked at the
Le 10/05/2022 à 04:36, Andrew Piskorski a écrit :
On Mon, May 09, 2022 at 07:00:47PM +0200, Antoine Pitrou wrote:
Generally, the Arrow IPC file/stream formats are designed for large
data. If you have many very small files you might try to rethink how you
store your data on disk.
Ah. Is
Well, in any case, the release manager should make the final call, so a
label would mostly be a sophisticated way of pinging them.
Le 09/05/2022 à 20:45, Weston Pace a écrit :
How should we indicate whether a JIRA is a bugfix, which should be
included in the next RC, or something else that
Le 09/05/2022 à 20:28, Tomek Drabas a écrit :
I am new to this board so please, let me know if any of this doesn't make
sense.
I am building a FligthSQL example with DuckDB backend. DuckDB already has
an Arrow interface defined in duckdb.h that returns ArrowArray. However,
the import is not gu
Hi Andrew,
If the Arrow files are small, chances are the metadata (which is always
being read) is as large on disk as the actual data (which is "only"
mmap'ed). Also, mmap'ing works on a page granularity (a page being
typically 4 kB on x86, sometimes a bit larger on other architectures),
an
+1 from me. I'm actually surprised that we didn't do something like that
already. Adding new features from one RC to another sounds like a very
bad idea.
Regards
Antoine.
Le 09/05/2022 à 14:33, Raul Cumplido a écrit :
Hi,
I would like to propose a change in our release process.
The rat
That sounds ok to me.
Le 05/05/2022 à 13:01, Jacob Wujciak a écrit :
Hi all,
I would like to propose that we drop support for manylinux2010.
CentoOS 6, on which the manylinux2010 image is based, has been EOL for over
two years [1].
There is now also an official announcement by pypa that
man
Le 04/05/2022 à 17:21, Alessandro Molina a écrit :
The proposal seems reasonable to me, we should do our best at providing
users the same experience on the various systems whenever possible.
As long as we don't receive complaints about the package size, I think we
can live with it. If it becom
Le 28/04/2022 à 17:07, Li Jin a écrit :
Aha thanks Antoine!
After digging the log I think I found the issue:
"
-- clang-tidy 12 not found
-- clang-format 12 not found
"
after installing those two it got me over that step..
A side question - does running
"
archery lint --cpplint --clang-form
Le 28/04/2022 à 16:54, Li Jin a écrit :
Hello!
I am preparing for submitting a PR and reading the "Code style and Linting"
section of the development doc:
https://github.com/apache/arrow/blob/master/docs/source/developers/cpp/development.rst#code-style-linting-and-ci
I got to the point that
t file from js to WASM and
de-serialized it to arrow directly in wasm - so memory was already being
allocated from within WASM sandbox, not JS. Sorry for the confusion.
[1] https://github.com/WebAssembly/design/issues/1439
Best,
Jorge
On Tue, Apr 26, 2022 at 3:43 PM Antoine Pitrou
wrote:
Do we want something more flexible than dlopen() and runtime symbol
lookup (a mechanism which constrains the way you can organize and
distribute drivers)?
For example, perhaps we could expose an API struct of function pointers
that could be obtained through driver-specific means.
Le 26/0
pr 26, 2022 at 10:22 AM Antoine Pitrou wrote:
Le 26/04/2022 à 16:18, Jorge Cardoso Leitão a écrit :
Would WASM be able to interact in-process with non-WASM buffers safely?
AFAIK yes. My understanding from playing with it in JS is that a
WASM-backed udf execution would be something like:
1. co
ython function, which has security implications since the Python
interpreter allows everything by default.
Best,
Jorge
On Tue, Apr 26, 2022 at 2:56 PM Antoine Pitrou wrote:
Le 25/04/2022 à 23:04, David Li a écrit :
The WebAssembly documentation has a rundown of the techniques used:
https://weba
rguments are of different length - we'd need something like the ColumnBag
proposal, so this might be a good reason to revive that).
On Mon, Apr 25, 2022, at 16:35, Antoine Pitrou wrote:
Le 25/04/2022 à 22:19, Wes McKinney a écrit :
I was going to reply to this e-mail thread on user@ but tho
Hi Kevin,
There are a couple of concerns to keep in mind:
- we don't want to increase the import time of PyArrow too much
- we would like to limit the required runtime dependencies for PyArrow
(an issue is open to move docstring generation at package build time:
https://issues.apache.org/jira/
Le 25/04/2022 à 22:19, Wes McKinney a écrit :
I was going to reply to this e-mail thread on user@ but thought I
would start a new thread on dev@.
Executing user-defined functions in memory, especially untrusted
functions, in general is unsafe. For "trusted" functions, having an
in-memory API f
+1 from me (binding), with caveat that I'm not competent in JDBC and the
proposed changes in PR [3] look reasonable to me.
Best regards
Antoine.
Le 20/04/2022 à 22:13, David Li a écrit :
Hello,
Iury da Guia Salino has proposed an addition to Arrow Flight SQL, an experimental
protocol fo
Hello Li,
The temporal rounding operations operate on localized times taking into
account the timestamp's timezone, which is why they're more
computationally intensive that raw floating point operations.
Which operation in particular did you benchmark? Is it part of a
significant workload
ht?
Indeed you can have an initial stab at that.
Regards
Antoine.
Sasha
3 апр. 2022 г., в 11:47, Antoine Pitrou написал(а):
It would be a very significant contributor, as the inconsistency can manifest
under the form of up to 8-fold differences in performance (or perhaps more).
Le 01/04/2022 à 08:43, Sasha Krassovsky a écrit :
I agree that a potential inconsistent experience is a problem, but I
disagree that SIMD would be the root of the problem, or even be a
significant contributor to it.
It would be a very significant contributor, as the inconsistency can
manifes
Le 31/03/2022 à 09:19, Sasha Krassovsky a écrit :
As I showed, those auto-vectorized kernels may be vectorized only in some
situations, depending on the compiler version, the input datatypes...
I would more than anything interpret the fact that that code was vectorized at
all as an amazing
generate different vectorized code, and clang and gcc do not
auto-vectorize
at the same optimization level (O2 for clang and O3 or O2
-ftree-vectorize
for gcc)
Regards,
Johan
On Wed, Mar 30, 2022 at 10:10 AM Antoine Pitrou
wrote:
Hi Sasha,
Le 30/03/2022 à 00:14, Sasha Krassovsky a écr
Hi Sasha,
Le 30/03/2022 à 00:14, Sasha Krassovsky a écrit :
I've noticed that we include xsimd as an abstraction over all of the simd
architectures. I'd like to propose a different solution which would result
in fewer lines of code, while being more readable.
My thinking is that anything simp
ACE is already the name of a well-known C++ library, though I'm not sure
how widely used it is nowadays :
http://www.dre.vanderbilt.edu/~schmidt/ACE.html
I would name it "execution engine" or "Arrow C++ execution engine" in full.
Regards
Antoine.
Le 29/03/2022 à 00:15, Wes McKinney a écri
Hello Will,
So the added value would simply be the automatic definition of
iterator-returning methods? Or am I missing something?
Regards
Antoine.
Le 23/03/2022 à 19:36, Will Jones a écrit :
Hello Arrow devs,
I recently created ARROW-16006 [1] ("Helpers for converting between rows
and A
Moral +1 from me. I've posted minor comments on the specs changes in the
PRs.
Le 16/03/2022 à 20:50, David Li a écrit :
Hello,
Jose Almeida and James Duong have proposed two additions to Arrow Flight SQL,
an experimental protocol for interacting with SQL databases over Arrow Flight.
The
If it's only about missing artifacts and something else needs to change,
I would hope so indeed.
Le 10/03/2022 à 15:16, David Li a écrit :
Hmm, I'm not too sure how procedures work here, but is it possible to just vote
and upload the missing 7.0.0 artifacts, instead of going through the wh
On Mon, 7 Mar 2022 11:52:02 -0800
HK Verma wrote:
> Thanks Antoine. Yes I have newlines_in_values set to false. Other configs
> also look ok.
> However I do have rows with less number of columns than the specified
> numbers in convert options in column types. I have my own
> invalid_row_handler w
e appropriate size for
decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low risk
from an integration perspective, as implementations already need to read
the bitwidth to select the appropriate physical representation (if they
support it).
Best,
Jorge
On Mon, Mar 7, 2022, 11:41 Ant
Hi HK,
On Mon, 7 Mar 2022 10:16:07 -0800
HK Verma wrote:
> I am integrating Arrow with another C++ library. For this, I wrote an input
> stream which feeds CSV data into the streaming reader. It fails for very
> large files with the error messages like - "CSV parser got out of sync with
> chunk
this case, it
might be argued we are just relaxing the constraints on an existing type.
What do others think?
Regards
Antoine.
On Thu, Mar 3, 2022 at 6:55 AM Antoine Pitrou wrote:
Hello,
Currently, the Arrow format specification restricts the bitwidth of
decimal numbers to either 128 o
I opened https://issues.apache.org/jira/browse/ARROW-15846
Regards
Antoine.
Le 04/03/2022 à 15:05, Antoine Pitrou a écrit :
Le 04/03/2022 à 15:01, Hanqi Wu a écrit :
Hi Antoine,
I agree n_buffers should still be set to 1. But as per the below PyArrow doc,
n_buffers’s value will be 0 if
x27;s still "present" in the metadata (for example as a
null pointer, if using the C data interface).
This probably deserves clarifying, though. I'll open an issue.
Regards
Antoine.
https://arrow.apache.org/docs/format/Columnar.html#struct-layout
Thanks,
Hanqi
On Mar
produced when exporting such an
array.
Regards
Antoine.
However, “import_from_c” expects StructArray to always have at least 1
buffer allocated, otherwise it throws an exception.
Best,
Hanqi
On Mar 4, 2022, at 8:47 AM, Antoine Pitrou wrote:
Le 04/03/2022 à 04:17, Hanqi Wu a écrit
Hello Will,
Le 04/03/2022 à 01:27, Will Jones a écrit :
I've come across several different environments where Arrow either fails to
configure with CMake or fails to link libraries. Some recent examples I've
come across:
* (Just fixed [1]) Windows, RTools4 (MSYS2), Debug, dynamic libraries
Le 04/03/2022 à 04:17, Hanqi Wu a écrit :
Hello community,
As per the below documentation, for an Arrow StructArray, it won’t have any
physical buffers backing it if it doesn’t contain any null value:
https://arrow.apache.org/docs/format/Columnar.html#struct-layout
However, in PyArrow, it co
Hello,
Currently, the Arrow format specification restricts the bitwidth of
decimal numbers to either 128 or 256 bits.
However, there is interest in allowing other bitwidths, at least 32 and
64 bits for this proposal. A 64-bit (respectively 32-bit) decimal
datatype would allow for precision
Can we just add the following field to the FlightDescriptor message:
bool accept_inline_data = 4;
and this one to the FlightInfo message:
FlightData inline_data = 100;
Then new clients can `accept_inline_data` to true (the default being
false if omitted) to signal servers that they can pu
Hello,
Just a note that Hiveserver2 support is currently broken in Arrow C++,
and it may have been for a long time (attempting to compile it just
doesn't work):
https://issues.apache.org/jira/browse/ARROW-15774
Is there any interested party to work on fixing this? Otherwise, we
may want to rem
post-condition for the resulting struct
array is that its length is equal to the length of all of its children
arrays.
Cheers,
Micah
On Fri, Feb 18, 2022 at 1:12 PM Phillip Cloud wrote:
On Fri, Feb 18, 2022 at 3:44 PM Antoine Pitrou wrote:
Le 18/02/2022 à 21:32, Phillip Cloud a écrit :
I am r
Le 18/02/2022 à 21:32, Phillip Cloud a écrit :
I am really struggling to see how anything I've said is inconsistent with
the spec or what you are saying here.
To recap what I've said:
1. Appending a null sentinel to the values buffer isn't _required_ unless
the type requires it.
Ex: "joemark
Le 18/02/2022 à 20:26, Phillip Cloud a écrit :
On Fri, Feb 18, 2022 at 2:06 PM Antoine Pitrou wrote:
Le 18/02/2022 à 20:01, Phillip Cloud a écrit :
I think I'm confused by where this appended value lives. Is it only a
logical value or does the value show up in memory?
The logical
301 - 400 of 1010 matches
Mail list logo