Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-10 Thread Eduardo Ponce
As a Spanish speaking person, I cannot think of a misleading or bad connotation for the word "acero". The word is generally used to refer to either steel materials (actual definition) or as a simile/metaphor comparing to something very strong. We can view this as a self-laud on the robust and

Re: Arrow sync call January 5 at 12:00 US/Eastern, 17:00 UTC

2022-01-05 Thread Eduardo Ponce
Nic Crane, Micah Kornfeld, Eduardo Ponce, Will Jones, Rok Mihevc, > David Li, Niranda Perera, Benson Muite > > > Agenda > > - Discussion about the new columnar memory layout > - Preparing for 7.0.0 release - 2nd or 3rd week of January > - Documentation improvement > - Suppo

Re: [ANNOUNCE] New Arrow committer: Alessandro Molina

2022-01-05 Thread Eduardo Ponce
Great addition to PMC. Congratulations! ~Eduardo On Wed, Jan 5, 2022 at 7:34 AM Wes McKinney wrote: > On behalf of the Arrow PMC, I'm happy to announce that Alessandro > Molina has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! >

Re: [ANNOUNCE] New Arrow PMC member: Yibo Cai

2022-01-04 Thread Eduardo Ponce
Congratulations Yibo! Thanks for all your contributions and guidance. On Tue, Jan 4, 2022 at 3:52 AM Wes McKinney wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Yibo Cai to become a PMC member and we are pleased to announce > that Yibo has accepted. > >

Re: [ANNOUNCE] New Arrow PMC member: Daniël Heres

2021-12-21 Thread Eduardo Ponce
Congrats! > On Dec 21, 2021, at 12:18 PM, Wes McKinney wrote: > > The Project Management Committee (PMC) for Apache Arrow has invited > Daniël Heres to become a PMC member and we are pleased to announce > that Daniël has accepted. > > Congratulations and welcome!

Re: [DISCUSS][Python] Public Cython API

2021-10-28 Thread Eduardo Ponce
Hi all, I am helping resolve this GH issue [1] with this PR [2], where user wants to use `CRecordBatch.column_data()` method from Cython to access the underlying `CArrayData` but `column_data()` is not exposed in `CRecordBatch`. There is a workaround to access the `CArrayData` [3]. Nevertheless,

Re: C++ Parquet thrift_ep No rule to make target install

2021-09-20 Thread Eduardo Ponce
Hi Rares, The reason compilation fails when you set ARROW_PARQUET=ON is because this flag also enables installing Apache Thrift [1] and support for Thrift in CentOS systems is fragile (see THRIFT-2559 [2]). When you disable Parquet, Thrift is not installed as a required dependency. I recommend

Re: [ANNOUNCE] New Arrow committer: Nic Crane

2021-09-09 Thread Eduardo Ponce
Congratulations Nic! ~Eduardo On Thu, Sep 9, 2021 at 11:47 AM Neal Richardson wrote: > On behalf of the Apache Arrow PMC, I'm happy to announce that Nic Crane > has accepted an invitation to become a committer on Apache Arrow. > > Welcome and thank you for your contributions! > > Neal >

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Eduardo Ponce
To add to Antoine's points, besides data alignment being beneficial for reducing cache line reads/write and overall using the cache more effectively, another key point is when using vector (SIMD) registers. Although recent CPUs can load unaligned data to vector registers at similar speeds as

Re: [DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread Eduardo Ponce
I agree with this proposal, the Arrow C++ library does not need to depend on Python or PyArrow code. AFAIU this will eliminate the use of -DARROW_PYTHON build flag for Arrow C++ given that Python-related code will be compiled with PyArrow builds. Besides the use of "ARROW_PYTHON" env variable in

Re: [ANNOUNCE] New Arrow PMC member: Neville Dipale

2021-07-29 Thread Eduardo Ponce
Congratulations! ~Eduardo From: paddy horan Sent: Thursday, July 29, 2021, 8:04 PM To: dev@arrow.apache.org Subject: Re: [ANNOUNCE] New Arrow PMC member: Neville Dipale Congrats Neville! From: Wes McKinney Sent: Thursday, July

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Eduardo Ponce
My mistake, I confused the input type to kernels as Datums, when they are in fact Scalar and ArrayData. I agree that SIMD details should not be exposed in the kernel API. ~Eduardo On Wed, Jul 28, 2021 at 6:38 PM Wes McKinney wrote: > On Wed, Jul 28, 2021 at 5:23 PM Eduardo Ponce wr

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Eduardo Ponce
Hi all, I agree with supporting finer-grained parallelism in the compute operators. I think that incorporating a Datum-like span, would allow expressing parallelism not only on a per-thread basis but can also be used to represent SIMD spans, where span length is directed by vector ISA, "L2" cache

Re: [C++][DISCUSS][STRAW POLL] Enum metaprogramming

2021-07-22 Thread Eduardo Ponce
I support having a C++ meta enum construct as it will provide functionalities not readily allowed with standard enums. For example, I have found scenarios (mainly for writing tests) where iterating through the enum values would be useful. Is the rationale of this proposal to replace all C++

Re: [ANNOUNCE] New Arrow committer: Weston Pace

2021-07-09 Thread Eduardo Ponce
Congratulations Weston and thanks for your hard work! ~Eduardo ~Eduardo From: David Li Sent: Friday, July 9, 2021 9:14:19 AM To: dev@arrow.apache.org Subject: Re: [ANNOUNCE] New Arrow committer: Weston Pace Congrats Weston! On Fri, Jul 9, 2021, at 08:47, Wes

Re: Apache Arrow Cookbook

2021-07-07 Thread Eduardo Ponce
would make more sense. On the other hand, if the cookbook is to be limited in Arrow languages, then what would happen if a Rust cookbook is created? Would it be placed in the arrow-rs repo or as a standalone arrow/cookbook-rs repo? ~Eduardo On Wed, Jul 7, 2021 at 8:09 PM Eduardo Ponce wrote

Re: Apache Arrow Cookbook

2021-07-07 Thread Eduardo Ponce
Great work! I would recommend having the cookbook in its own repo so that its updates are not constrained by the timeline used for updating the public Arrow documentation. This will allow users that are not involved in Arrow development to contribute or provide suggestions to the cookbook fairly

Re: [VOTE] Arrow should state a convention for encoding instants as Timestamp with "UTC" as the time zone

2021-07-01 Thread Eduardo Ponce
+1 (non-binding) ~Eduardo From: Rok Mihevc Sent: Thursday, July 1, 2021 4:21:49 AM To: dev@arrow.apache.org Subject: Re: [VOTE] Arrow should state a convention for encoding instants as Timestamp with "UTC" as the time zone +1 (non-binding) On Thu, Jul 1,

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Eduardo Ponce
+1 (non-binding) On Fri, Jun 25, 2021 at 4:31 AM Joris Peeters wrote: > +1 > > On Fri, Jun 25, 2021 at 9:29 AM Joris Van den Bossche < > jorisvandenboss...@gmail.com> wrote: > > > +1 > > > > On Thu, 24 Jun 2021 at 21:21, Micah Kornfield > > wrote: > > > > > +1 (binding) > > > > > > On Thu, Jun

Re: [ANNOUNCE] New Arrow PMC member: David M Li

2021-06-22 Thread Eduardo Ponce
Great news, congratulations! ~Eduardo On Tue, Jun 22, 2021 at 1:33 PM Andrew Lamb wrote: > Congratulations David > > On Tue, Jun 22, 2021 at 8:56 AM David Li wrote: > > > Thanks everyone! > > > > I've learned a lot and had a great time contributing here, and I look > > forward to continuing

Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only

2021-06-11 Thread Eduardo Ponce
FWIW, this CI C++ build script contains what I think is a comprehensive list of cmake options supported in Arrow along with the common default values: https://github.com/apache/arrow/blob/master/ci/scripts/cpp_build.sh#L47-L132 Although I am not sure if this set of default values is used to build

Re: [C++][Discuss] Switch to C++17

2021-06-09 Thread Eduardo Ponce
After the discussion in today's Arrow sync call, I do think it would be beneficial to come up with a formal process for deciding when is a "right time" for upgrading Arrow to a newer C++ standard. I suggest we could consider a set of general metrics/criteria that try to summarize the benefits and

Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-08 Thread Eduardo Ponce
Congratulations!! ~Eduardo On Mon, Jun 7, 2021 at 11:06 PM Fan Liya wrote: > Congratulations, Kazuaki! > > Best, > Liya Fan > > On Tue, Jun 8, 2021 at 7:59 AM Rok Mihevc wrote: > > > Congrats! > > > > On Tue, Jun 8, 2021 at 1:36 AM Micah Kornfield > > wrote: > > > > > Congrats! > > > > > >

Re: [ANNOUNCE] New Arrow committer: Dominik Moritz

2021-06-04 Thread Eduardo Ponce
Congratulations! ~Eduardo On Fri, Jun 4, 2021 at 3:26 AM Fan Liya wrote: > Congratulations Dominik! > > Best, > Liya Fan > > On Thu, Jun 3, 2021 at 10:45 AM David Li wrote: > > > Congratulations Dominik! > > > > -David > > > > On Wed, Jun 2, 2021, at 18:09, Rok Mihevc wrote: > > > Congrats

Re: Long title on github page

2021-05-17 Thread Eduardo Ponce
e, a brand, a high-concept pitch, > and 3- or 4-sentence description. But every Apache project needs these too. > It’s worth spending the time on the description, also, and then use them in > all the places that we describe Arrow. > > > > Julian > > > > [1]

Re: Long title on github page

2021-05-17 Thread Eduardo Ponce
I agree with Nate's and Brian's suggestions, but would like to add that we can make it a one-liner for more conciseness and consistency with other Apache projects. Apologies if it seems I am going around the suggestions loop again. "Apache Arrow is a cross-language development platform enabling

Re: [C++][DISCUSS] Implementing interpreted (non-compiled) tests for compute functions

2021-05-14 Thread Eduardo Ponce
Another aspect to keep in mind is that some tests require internal options to be changed before executing the compute functions (e.g., check overflow, allow NaN comparisons, change validity bits, etc.). Also, there are tests that take randomized inputs and others make use of the min/max values for

Re: [C++] Deciding between "compute function" and "utility function"

2021-05-11 Thread Eduardo Ponce
This is a very good question. I agree with @Antoine and would like to add that the focus of compute functions is to have a public API while utility functions are for internal use. A similar operation to ARROW-12739 are structural transformations [1] such as "list_flatten" [2], which makes use of

Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread Eduardo Ponce
Great news! Congratulations Ben. ~Eduardo From: Wes McKinney Sent: Wednesday, May 5, 2021, 7:10 PM To: dev Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman The Project Management Committee (PMC) for Apache Arrow has invited Benjamin Kietzman to

Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-04 Thread Eduardo Ponce
+1 (non-binding) Great work! The only comment I have is regarding the case form Arrow files are referenced. In the "Security Considerations" section the document refers to them as "arrow files". Is the lowercase form intentional or will "Arrow files" be a more desired form? ~Eduardo

Re: [C++] adopting an SIMD library - xsimd

2021-03-01 Thread Eduardo Ponce
In my experience there is no single SIMD library that wraps all possible set of vector instructions across the most common architectures and at the same time provides support for all popular compilers while supporting C and C++11/14. (I mention C because there is an issue for Arrow support in