Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-06-06 Thread Jorge Cardoso Leitão
Hi, Thanks a lot for your feedback. I agree with all the arguments put forward, including Andrew's point about the large change. I tried a gradual 4 months ago, but it was really difficult and I gave up. I estimate that the work involved is half the work of writing parquet2 and arrow2 in the

Re: post-release tasks (4.0.1)

2021-06-05 Thread Jorge Cardoso Leitão
On Mon, May 31, 2021 at 1:03 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Thanks a lot, both. > > Accepted. Will upload this evening. > > Best, > Jorge > > > On Mon, May 31, 2021 at 12:55 PM Krisztián Szűcs < > szucs.kriszt...@gmail.com> w

Re: [ANNOUNCE] New Arrow committer: Dominik Moritz

2021-06-04 Thread Jorge Cardoso Leitão
Thank you for all your contributions so far, it is really appreciated and great to see. Congratulations :) On Fri, Jun 4, 2021 at 6:14 PM Ying Zhou wrote: > > Congrats Dominik! > > > On Jun 2, 2021, at 5:19 PM, Wes McKinney wrote: > > > > On behalf of the Arrow PMC, I'm happy to announce that

Re: [Format] Timestamp timezone semantics?

2021-06-03 Thread Jorge Cardoso Leitão
That is my understanding as well, a timestamp either has a timezone or it has not. If it does not have a timezone, it should be presented as is and no assumptions can be made about its timezone. In particular, but given two fields X and Y, one with a timezone and another without, e.g. it is not

Re: post-release tasks (4.0.1)

2021-05-31 Thread Jorge Cardoso Leitão
Thanks a lot, both. Accepted. Will upload this evening. Best, Jorge On Mon, May 31, 2021 at 12:55 PM Krisztián Szűcs wrote: > On Sun, May 30, 2021 at 7:37 PM Jorge Cardoso Leitão > wrote: > > > > Hi, > > > > Sorry for the delay here. > > > > Below

post-release tasks (4.0.1)

2021-05-30 Thread Jorge Cardoso Leitão
Hi, Sorry for the delay here. Below is the list of post-release tasks: 1. [ ] open a pull request to bump the version numbers in the source code 2. [x] upload source 3. [x] upload binaries 4. [ ] update website (https://github.com/apache/arrow-site/pull/114) 5. [x] upload ruby gems 6. [ ]

Re: [VOTE][RUST] Release Apache Arrow Rust 4.2.0

2021-05-29 Thread Jorge Cardoso Leitão
+1 (binding) Verified the RC on Apple Intel x86. On Sat, May 29, 2021 at 11:34 PM Sutou Kouhei wrote: > +1 > > I ran the following command line on Debian GNU/Linux sid: > > dev/release/verify-release-candidate.sh 4.2.0 1 > > > Thanks, > -- > kou > > In > "[VOTE][RUST] Release Apache Arrow

Re: [VOTE] Release Apache Arrow 4.0.1 - RC1

2021-05-26 Thread Jorge Cardoso Leitão
Hi, The vote passed with 3 +1 votes and zero -1 or 0. I will proceed with the post-vote tasks. Thank you for your patience, Jorge On Wed, May 26, 2021 at 6:28 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > forgot mine: +1 (binding) > > On Mon, May 24, 2021 at 12:

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-25 Thread Jorge Cardoso Leitão
5107dd2 [6] https://docs.google.com/spreadsheets/d/1hLKsqJaw_VLjtJCgQ635R9iHDNYwZscE0OT1omdZuwg/edit#gid=402497043 [7] https://github.com/apache/arrow-datafusion/pull/68 [8] https://issues.apache.org/jira/browse/ARROW-12643 On Sun, Feb 7, 2021 at 2:42 PM Jorge Cardoso Leitão < jorgecarlei...@g

Re: [VOTE] Release Apache Arrow 4.0.1 - RC1

2021-05-25 Thread Jorge Cardoso Leitão
> > changes not for 4.0.1. Please use the "verify-4.0.1-rc1" > > branch on https://github.com/kszucs/arrow/ . > > See also: https://github.com/apache/arrow/pull/10374 > > > > > > Thanks, > > -- > > kou > > > > In > >

Re: [MATLAB] Label for MATLAB Pull Requests

2021-05-21 Thread Jorge Cardoso Leitão
Hi, Could you create a JIRA and PR? The relevant place is here [1] Best, Jorge [1] https://github.com/apache/arrow/blob/master/.github/workflows/dev_pr/labeler.yml#L1 On Fri, May 21, 2021, 17:46 Sarah Gilmore wrote: > Hi all, > > I was looking through the list of open pull-requests and I

Re: [DISCUSS] 4.0.1 patch release?

2021-05-21 Thread Jorge Cardoso Leitão
ease tarball must be signed by the release manager. > > > > > > Do you have an Apache Code Signing key? > > > If not, then it could be better if either Kou or I would be the > > > release manager. > > > > > > [1]: https://dist

[VOTE] Release Apache Arrow 4.0.1 - RC1

2021-05-21 Thread Jorge Cardoso Leitão
Hi, I would like to propose the following release candidate (RC1) of Apache Arrow version 4.0.1. This is a release consisting of 19 resolved JIRA issues[1]. This is a normal release (i.e. vote on source and binaries). This release candidate is based on commit:

Re: Language silos and transpilers

2021-05-19 Thread Jorge Cardoso Leitão
There are two examples: an example in DataFusion [1], and an example in python [2]. In DataFusion, the performance is the same because the UDF is compiled as Rust. It can even be compiled with SIMD intrinsics. In Python, it depends what is used inside the UDF: * If only pyarrow.compute

Re: [VOTE][RUST] Release Apache Arrow Rust 4.1.0 RC2

2021-05-18 Thread Jorge Cardoso Leitão
+1 Checked checksums and ran cargo test, and all good. On Wed, May 19, 2021 at 1:40 AM Andy Grove wrote: > +1 (binding) > > I checked the checksums, reviewed the changelog, and ran "cargo test". > Tests passed. > > On Tue, May 18, 2021 at 5:13 PM Sutou Kouhei wrote: > > > +0 > > > > I could

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Jorge Cardoso Leitão
Sat, May 15, 2021 at 7:44 AM Jorge Cardoso Leitão > wrote: > > > > Hi, > > > > I have started collecting commits to the maint branch [1]. The exact > > commands I used: > > > > git clone g...@github.com:apache/arrow.git > > cd arrow/dev > > pytho

Re: [DISCUSS] 4.0.1 patch release?

2021-05-14 Thread Jorge Cardoso Leitão
rsions. > > Neal > > > On Fri, May 14, 2021 at 12:32 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Just to make sure: the goal is to cherry-pick all changes targeted for > > 4.0.1 into a branch and release from there? If that is the c

Re: [DISCUSS] 4.0.1 patch release?

2021-05-14 Thread Jorge Cardoso Leitão
8:20 AM Wes McKinney wrote: > > > Addressing these accumulated issues in a patch release sounds like a > > good idea to me. > > > > On Wed, May 12, 2021 at 6:18 PM Jorge Cardoso Leitão > > wrote: > > > > > > I agree. Segfaults are not nice. &g

Re: [C++][DISCUSS] Implementing interpreted (non-compiled) tests for compute functions

2021-05-14 Thread Jorge Cardoso Leitão
Hi, (this problem also exists in Rust, btw) Couldn't we use something like we do for our integration tests? Create a separate binary that would allow us to call e.g. test-compute --method equal --json-file --arg "column1" --arg "column2" --expected "column3" (or simply pass the input via

Re: [DISCUSS] 4.0.1 patch release?

2021-05-12 Thread Jorge Cardoso Leitão
I agree. Segfaults are not nice. I can take it. I would possibly need some guidance. Best, Jorge On Thu, May 13, 2021 at 12:52 AM Neal Richardson < neal.p.richard...@gmail.com> wrote: > Hi, > As discussed at the biweekly sync call, I wanted to gauge interest in doing > a 4.0.1 patch release.

Re: [VOTE] [RUST] New release process for arrow-rs

2021-05-11 Thread Jorge Cardoso Leitão
+1 Thanks a lot, Andrew! On Wed, May 12, 2021 at 2:04 AM Sutou Kouhei wrote: > +1 > > In > "[VOTE] [RUST] New release process for arrow-rs" on Tue, 11 May 2021 > 18:16:14 -0400, > Andrew Lamb wrote: > > > Per previous discussions, I would like to propose a new release process > for > >

Re: [Discuss] Storing metadata about the "sortedness" of data

2021-05-11 Thread Jorge Cardoso Leitão
So, I think that both cases can be accomplished within DataFusion itself: * When the data is sorted at rest, we can add a method to the TableProvider to share this information with the query engine, like we do with partitioning. * When the data is sorted via some physical node / operation during

Re: Announcing Conbench + Arrow

2021-05-10 Thread Jorge Cardoso Leitão
This is awesome! Big kudos to everyone involved. Best, Jorge On Mon, May 10, 2021 at 8:12 PM Diana Clarke wrote: > Hi folks: > > Last week we officially announced a new benchmarking tool called > Conbench with an Arrow integration. > > https://ursalabs.org/blog/announcing-conbench/ >

Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-04 Thread Jorge Cardoso Leitão
+1 Also, great process, Weston. Best, Jorge On Tue, May 4, 2021 at 6:48 PM Antoine Pitrou wrote: > > +1 from me. Thank you for doing this! > > Regards > > Antoine. > > > Le 04/05/2021 à 13:41, Weston Pace a écrit : > > Per ARROW-7396 I would like to propose an application to the IANA to >

Re: Please Review: Application for a Media Type

2021-05-04 Thread Jorge Cardoso Leitão
t; >> > >> > >> Thanks, > >> -- > >> kou > >> > >> In > >> "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021 > 06:44:51 +0200, > >> Jorge Cardoso Leitão wrote: > >> > >> >

[Rust] remove Rust from apache/arrow

2021-05-03 Thread Jorge Cardoso Leitão
Hi, The PR proposing the removal of the Rust implementation from apache/arrow is ready: https://github.com/apache/arrow/pull/10096 It is a -200k LOC change, so I wanted to give it some visibility here. The integration tests continue to run as is; they will pull from the latest apache/arrow-rs.

New style in documentation on the website looks great

2021-05-01 Thread Jorge Cardoso Leitão
Hi, I am not sure who was behind it, but it looks great! It really brings harmony to the website and offers a much cleaner UI to everyone using it. E.g. here: https://arrow.apache.org/docs/python/data.html Thanks a lot for this, Jorge

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-04-29 Thread Jorge Cardoso Leitão
Thanks for writing this. I agree. That is a good decision tree. +1 Best, Jorge On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield wrote: > The discussion around adding another interval type to the Schema.fbs raises > the issue of when do we decide to add a new type to the Schema.fbs vs using >

Re: Independent releases and format version

2021-04-28 Thread Jorge Cardoso Leitão
Hi, Thanks for bringing this up. AFAIK, this has always been an issue because we do not require all implementations to support a type prior to releasing that new type in the spec. Concrete example: map has been released for a while, but Rust does not support it yet. Same for interval, where we

Re: [ANNOUNCE] New Arrow committer: Ian Cook

2021-04-28 Thread Jorge Cardoso Leitão
Congratulations and thank you for your contributions :) On Wed, Apr 28, 2021 at 10:37 PM Neal Richardson < neal.p.richard...@gmail.com> wrote: > On behalf of the Arrow PMC, I'm happy to announce that Ian has accepted an > invitation to become a committer on Apache Arrow. Welcome, and thank you >

Re: [ANNOUNCE] New Arrow committer: Jonathan Keane

2021-04-28 Thread Jorge Cardoso Leitão
Congratulations and thank you for your contributions :) On Wed, Apr 28, 2021 at 10:37 PM Neal Richardson < neal.p.richard...@gmail.com> wrote: > On behalf of the Arrow PMC, I'm happy to announce that Jonathan has > accepted an invitation to become a committer on Apache Arrow. Welcome, and >

Re: [ANNOUNCE] New Arrow committer: Daniël Heres

2021-04-28 Thread Jorge Cardoso Leitão
Congratulations, and thank you for your contributions; I have been learning a lot from you :) Best, Jorge On Wed, Apr 28, 2021 at 6:10 PM Andrew Lamb wrote: > Congratulations Daniël! > > On Wed, Apr 28, 2021 at 9:25 AM Andy Grove wrote: > > > On behalf of the Arrow PMC, I'm happy to announce

Re: [DISCUSS] Moving the format directory to arrow-format repository

2021-04-28 Thread Jorge Cardoso Leitão
Hi, imo the time-scale of changes in the format is too large to justify the complexity. I also think that we should not force users to clone or submodule the repo to even compile the crate. What if we just do not have the format files there at all, and instead just keep the generated code?

Re: [Python] Who has been able to use PyArrow 4.0.0?

2021-04-28 Thread Jorge Cardoso Leitão
Hi, I am unable to reproduce it on Mac OS 11.2 ``` python3 --version > Python 3.7.4 python3 -m venv venv source venv/bin/activate pip --version > pip 19.0.3 pip install pyarrow==4.0.0 > downloading [...]/pyarrow-4.0.0-cp37-cp37m-macosx_10_13_x86_64.whl > ... pip freeze > numpy==1.20.2 >

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-26 Thread Jorge Cardoso Leitão
On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > I fielded a PR [1] to open up a discussion to incorporate > python-datafusion > > [2] into the Apache Arrow project. > > > > Python-datafus

Re: [DISCUSS] experimental repos

2021-04-26 Thread Jorge Cardoso Leitão
-offs between brand > new repos as compared to a separate branch in existing one in an existing > one? > > Cheers, > Micah > > On Sun, Apr 25, 2021 at 9:31 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > As d

Re: [RUST] parquet2 experiment

2021-04-25 Thread Jorge Cardoso Leitão
nging it to an ASF repo that you can't > get in your own repo? > > Perhaps you are ready to bring in other collaborators and wish to ensure > they have undergone the Apache IP clearance process? > > Andrew > > > On Fri, Apr 16, 2021 at 12:22 PM Jorge Cardoso Leitão < >

[DISCUSS] experimental repos

2021-04-25 Thread Jorge Cardoso Leitão
Hi, As discussed in other threads (rust sync and parquet2), I would like to open the discussion around opening repos for experimental work that may or may not be merged / used. The rationale is that we incentivise work to be conducted within ASF and Apache Arrows' governance, thereby clarifying

[DISCUSS] [Rust] Python-datafusion

2021-04-25 Thread Jorge Cardoso Leitão
Hi, I fielded a PR [1] to open up a discussion to incorporate python-datafusion [2] into the Apache Arrow project. Python-datafusion is a Python library [3] built on top of DataFusions that enables people to use DataFusion from Python. It leverages the C data interface for zero-cost copy between

Re: [Go, Rust] inter-language Arrow API compatibility

2021-04-23 Thread Jorge Cardoso Leitão
Hi Agam, We have integration tests that produce data from Rust and consumes from Go, and vice-versa, for both IPC files and streams, so it is reasonable to expect that they work. Deviations are considered major out-of-spec bugs, which we often prioritize over e.g. new features. Best, Jorge On

Re: Please Review: Application for a Media Type

2021-04-21 Thread Jorge Cardoso Leitão
Thanks for driving this, exciting stuff! I went through it, left minor comments, it looks good to me. wrt to the extension: imo they should be different as the formats are not interchangeable. AFAIK `.stream` is not taken: it was used by Adobe shockwave but it was discontinued [1]. So, .arrow

Re: [ANNOUNCE] Copying Rust components to new repositories

2021-04-20 Thread Jorge Cardoso Leitão
19, 2021 at 6:49 PM Krisztián Szűcs wrote: > On Mon, Apr 19, 2021 at 7:32 AM Jorge Cardoso Leitão > wrote: > > > > Hi, > > > > I created ARROW-12444 <https://issues.apache.org/jira/browse/ARROW-12444 > > > > to handle the changes on the apache/arrow s

Re: [ANNOUNCE] Copying Rust components to new repositories

2021-04-18 Thread Jorge Cardoso Leitão
che.org/confluence/display/INFRA/Git+-+.asf.yaml+features > >> > >> On Sun, Apr 18, 2021 at 11:52 AM Krisztián Szűcs < > >> szucs.kriszt...@gmail.com> wrote: > >> > >>> On Sun, Apr 18, 2021 at 7:03 PM Antoine Pitrou > >>> wrote: > &

Re: [ANNOUNCE] Copying Rust components to new repositories

2021-04-18 Thread Jorge Cardoso Leitão
Yes, we are not touching apache/arrow. Does anyone know how to request permissions on github repos? We can't even see "Settings" atm (we can push to master). A JIRA on INFRA? Thanks, Jorge On Sun, Apr 18, 2021 at 5:54 PM Krisztián Szűcs wrote: > On Sun, Apr 18, 2021 at 5:47 PM Antoine

[RUST] parquet2 experiment

2021-04-16 Thread Jorge Cardoso Leitão
Hi, As briefly discussed in a recent email thread, I have been experimenting with re-writing the Rust parquet implementation. I have not advertised this much as I was very sceptical that this would work. I am now confident that it can, and thus would like to share more details. parquet2 [1] is a

Re: CI feedback time

2021-04-15 Thread Jorge Cardoso Leitão
Hi, I agree. I'll submit two requirements though: > - the configuration for CI builds must be kept in the Arrow repository >(as they are currently in .github, etc.) > - CI builds must be runnable from PRs > I'll submit three more: - The result of the build (pass / did not pass) must be

Re: [VOTE] Move Rust components to new repos and process

2021-04-14 Thread Jorge Cardoso Leitão
+1 binding. On Thu, Apr 15, 2021 at 4:10 AM Micah Kornfield wrote: > +1 binding. Does this also cover other changes (issue management) +1 to > those as well. > > On Wednesday, April 14, 2021, Andy Grove wrote: > > > This vote is to determine if the Arrow PMC is in favor of the Rust > >

Re: [DISCUSS] [Rust] Move Rust components to new repos and process

2021-04-10 Thread Jorge Cardoso Leitão
gt; in-memory arrow (this seems to be the extent of the go integration tests at > least)? > > Sorry if this easily answerable from knowing archery better, but I'm still > in the learning/discovery phase of how exactly all the integration tests > are setup/run. > > -Jac

Re: [DISCUSS] [Rust] Move Rust components to new repos and process

2021-04-10 Thread Jorge Cardoso Leitão
Hi, Wrt to integration tests, I agree that it is important to have a plan prior to this. What we have been doing in the apache/arrow: 1. only release if integration tests pass against each other 2. release the signed tar with the latest of every implementation (i.e. master) My suggestion for

Re: [DISCUSS] [Rust] Move Rust components to new repos and process

2021-04-09 Thread Jorge Cardoso Leitão
Hi, The major problem that we are addressing with an independent release cycle is that the large majority of our users do not use released versions, neither from ASF archives nor from Cargo crates. They use a git hash commit. This is a problem because our git hashes are *de facto* releases. This

Re: Rust sync meeting

2021-04-08 Thread Jorge Cardoso Leitão
Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi Wes, > > > I will say additionally: it is important to have discussions on the > mailing list to explain / draw attention to major initiatives so > others aren't left wondering what people's plans are. > >

Re: Rust sync meeting

2021-04-08 Thread Jorge Cardoso Leitão
m open to moving it if we can > find a > > > > time that accommodates everyone reasonably well. I think we have > regular > > > > attendance from the US, UK, Europe, South Africa, and Australia so > far. > > > > > > > > Thanks, > &

Re: Rust sync meeting

2021-04-08 Thread Jorge Cardoso Leitão
> > > > > > > > > > > > On Wed, Mar 24, 2021 at 5:24 PM Mike Seddon > wrote: > > > > > > > Hi Jorge, > > > > Can you please confirm the starting time (and timezone) and correct > > > Google > > > > Meet link

Re: Alignment not stored in arrow metadata

2021-04-07 Thread Jorge Cardoso Leitão
Hi Jacob, AFAIK IPC is just bytes. The alignment is done when they are copied over to allocated memory regions. It is the implementations' responsibility to allocate memory regions that are aligned depending on how those bytes should be interpreted (e.g. u64 vs u8). This interpretation is induced

Re: Status of Arrow Julia implementation?

2021-04-06 Thread Jorge Cardoso Leitão
Hi, > you all did not attempt to work in the community for any meaningful amount of time and are choosing not to try based on the perception that it will create unacceptable overhead for you It is not self-evident to me that Julia's community was sufficiently informed about what they had to give

Re: sparse data array

2021-03-25 Thread Jorge Cardoso Leitão
Would it be an option to use a StructArray for that? One array with the values, and one with the repetitions: Int32([1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 2]) -> StructArray([ "values": Int32([1, 2, 3, 1, 2]), "repetitions": UInt32([1, 3, 5, 1, 1]), ]) It does not have the same API, but I

Rust sync meeting

2021-03-24 Thread Jorge Cardoso Leitão
Hi, If someone is trying to join the meeting, please try this link: https://meet.google.com/pgx-xycf-zuf Best, Jorge

Re: [VOTE] Accept donation of Rust Ballista project

2021-03-21 Thread Jorge Cardoso Leitão
+1 (binding) On Sun, Mar 21, 2021 at 4:56 PM Andy Grove wrote: > Dear all, > > On behalf of the Ballista community, I would like to propose that we donate > Ballista to the Apache Arrow project. > > Ballista is a distributed scheduler based on Arrow standards (memory > format, IPC, Flight) and

Re: [ALL] Integration tests for dense and sparse tensor

2021-03-19 Thread Jorge Cardoso Leitão
Hi, Thanks a lot for bringing this up, Fernando. I had the same thought when I first looked at the tensor implementation in Rust. Now it is a bit more clear :) So, if I understood correctly, the direction would be to declare a "JSON-integration" equivalent for tensors, generate a set of "golden

Re: [DISCUSS] How to describe computation on Arrow data?

2021-03-18 Thread Jorge Cardoso Leitão
Hi, The main benefit I see for a standard for queries would not be on a serialization format, but on its semantics. IMO one of the main reasons for a lack of a standard of queries at the protobuf level is that human-readability vastly outweighs serialization - queries are at very most a megabyte

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-16 Thread Jorge Cardoso Leitão
gt; > > > I would like to talk a bit more specifically about the donation at > >> this > >> > > > point now that there is some feedback. > >> > > > > >> > > > What I propose we donate from Ballista is: > >> >

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-10 Thread Jorge Cardoso Leitão
a circular dependency, which is the > hellscape we escaped from with Parquet C++). I used the > datafusion-python project as an example — if that were in the Arrow > project I might consider using it in various ways or contribute to it, > but as an external project it's less interesting to

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-10 Thread Jorge Cardoso Leitão
Hi, First of all, I want to thank you very much for your work on Ballista and for doing it in an open source environment. It is something that should be emphasised and celebrated. Secondly, wrt to considering donating it to the Apache Foundation and Apache project in particular, I would say that

Re: [Rust] [DataFusion] Topic for next Rust Sync Call

2021-03-10 Thread Jorge Cardoso Leitão
Hi, If there is time available, I would like to present the status of the experimental arrow2 repo, and gather feedback on what would be the best way to proceed. 10-15m? Best, Jorge On Wed, Mar 10, 2021 at 1:57 PM Andrew Lamb wrote: > Also: > *

Re: [DISCUSS] integration against parquet files?

2021-03-09 Thread Jorge Cardoso Leitão
> > > > > > Some planning has started around this in PARQUET-1985 [1]. It seems it > > > would be relatively easy in the short term for Rust and C++ to reuse > > > archery for this purpose. I agree it is a good thing. > > > > > > > > >

Re: [Rust] Arrow in WebAssemby

2021-03-08 Thread Jorge Cardoso Leitão
Domink's point is that the IPC reader currently first writes the whole thing into a Vec, and then copies all of that to buffers using IPC::Buffer offsets and lengths. Thus, it performs 2 memcopies of the whole data and needs to hold 2x the required memory (the Vec and the arrow::Buffers). I

Re: [DISCUSS] Conventions for values masked by null bits

2021-03-08 Thread Jorge Cardoso Leitão
> > If the underlying values were allocated but not initialized they may leak > private information such as private keys, passwords, or tokens which were > placed in that memory then freed by an application without overwrite > I would not be concerned with the security implications of reading

Re: [ANNOUNCE] New Arrow PMC member: Andrew Lamb

2021-03-08 Thread Jorge Cardoso Leitão
Congratulations Andrew! It is being a pleasure working with you!!! :D Best, Jorge On Mon, Mar 8, 2021 at 6:23 PM Wes McKinney wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Andrew Lamb to become a PMC member and we are pleased to announce > that Andrew has

[DISCUSS] integration against parquet files?

2021-03-05 Thread Jorge Cardoso Leitão
Hi, To run integration with IPC, we have a set of `.arrow` and `.stream` files that different implementations consume and can compare against "golden" .json files that contain the corresponding in-memory representation. I wonder if we should not have equivalent json files corresponding to a

Re: New committer: Yibo Cai

2021-03-05 Thread Jorge Cardoso Leitão
Thank you for your contributions so far and congratulations, Yibo! Best, Jorge On Fri, Mar 5, 2021 at 8:18 PM Antoine Pitrou wrote: > > Hello, > > The Project Management Committee (PMC) for Apache Arrow has invited Yibo > Cai to become a committer and we are pleased to announce that he has >

Re: Requirements on JIRA usage in Apache Arrow

2021-03-02 Thread Jorge Cardoso Leitão
labels to issues that were so, but there are 5 variants of that label; if I do not know which one to pick, how can I expect a *beginner* to know which one to pick? Best, Jorge On Tue, Mar 2, 2021 at 10:26 AM Antoine Pitrou wrote: > > Hi Jorge, > > On Tue, 2 Mar 2021 08:

Re: Requirements on JIRA usage in Apache Arrow

2021-03-01 Thread Jorge Cardoso Leitão
Hi, FWIW, the amount of bureaucracy that goes into JIRA is a major contributing factor for the reduction of my time commitment to this project by 80%+. My limited understanding of the apache way is that it concerns how decision-making and governance happens on an apache project in the context of

Re: Constraints on fixed size list of variables sized types

2021-02-22 Thread Jorge Cardoso Leitão
s://github.com/apache/arrow/blob/995abdc02fed412bbd947fe41a0765036dbbe820/cpp/src/arrow/array/validate.cc#L103 > > > > > On Sun, Feb 21, 2021 at 12:38 AM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Hi, > > > > We state in the spec th

Constraints on fixed size list of variables sized types

2021-02-21 Thread Jorge Cardoso Leitão
Hi, We state in the spec that: A fixed size list type is specified like FixedSizeList[N], where T is > any type (*primitive or nested*) and N is a 32-bit signed integer > representing the length of the lists. > (emphasis mine) Now, suppose that we have FixedSizeList[2], i.e. a fixed type whose

Re: [DISCUSS] Conventions for values masked by null bits

2021-02-20 Thread Jorge Cardoso Leitão
I agree. Below are two notes from a similar discussion on the Rust implementation: 1. In SIMD, for performance reasons, operations are performed over the whole buffer irrespectively of the bitmap mask, and deal with the bitmap mask separately. If a slot contains an arbitrary value, the operation

Re: [Rust] Column names in FFI_ArrowSchema

2021-02-19 Thread Jorge Cardoso Leitão
No reason in particular: please go ahead! tips: anything that is to be deallocated when exporting must go to "private data" and deallocated by the extern "C" release bit, as we do in the DataType string. Think of that function as the "Drop" equivalent for ffi. Best, Jorge On Sat, Feb 20, 2021

Re: [Julia] C Data Interface for Julia

2021-02-14 Thread Jorge Cardoso Leitão
Hi Samay, That is great to know! The interfaces are not documented, but they are `_import_from_c` and `_export_to_c`. Both expect two pointers, one to the array and the other to the schema (that naturally must have a C representation). Tests within c++ are available here

Re: Push force to master by mistake

2021-02-14 Thread Jorge Cardoso Leitão
Yes please. :) On Sun, Feb 14, 2021, 22:08 Wes McKinney wrote: > Should we make master a "protected branch" now that we've resolved to > not rebase master ever again? > > On Sun, Feb 14, 2021 at 6:23 AM Jorge Cardoso Leitão > wrote: &

Re: Push force to master by mistake

2021-02-14 Thread Jorge Cardoso Leitão
I found the commit, 8547c616dcc7c3ee51f174d118c81b38847974af, and I pushed the changes up to there, so I think that everything is back to normal. Sorry for this. Best, Jorge On Sun, Feb 14, 2021 at 1:13 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > I m

Push force to master by mistake

2021-02-14 Thread Jorge Cardoso Leitão
Hi, I mistakenly force-pushed master with an outdated master. Could someone revert this? I can't find the commit hash with the (good) latest master anywhere. Really sorry about this!! :( Jorge

Array offset in IPC

2021-02-12 Thread Jorge Cardoso Leitão
Hi, I am going through the Rust implementation of the IPC, and I am trying to understand how we share Arraydata offsets. Specifically, our C data interface supports the notion of an offset, measured in slots, that denotes how many slots ahead of the buffer pointers we read from. This enables us

Re: [Python] Python based Query Engine for Arrow

2021-02-12 Thread Jorge Cardoso Leitão
Hi, Tom, This does not address the question directly, but for what is worth, I had the same issue and thus released a Python binding for DataFusion . It allows e.g. to create a pyarrow RecordBatch by reading from s3 (via pyarrow), and use it as a source to

[Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-02-07 Thread Jorge Cardoso Leitão
Hi, Over the past 4 months, I have been growing more and more frustrated by the amount of undefined behaviour that I am finding and fixing on the Rust implementation. I would like to open the discussion of a broader overview about the problem in light of our current knowledge and what Rust

Re: JIRA grooming

2021-02-05 Thread Jorge Cardoso Leitão
I would love that also! Atm we need to: * add the tag to the jira issue * add the component in the dropdown for components * add the component to the PR In case of multiple components, the above is per component, per PR. IMO we should only have to select the component from one place, which IMO

Re: [RUST] Arrow guide

2021-02-01 Thread Jorge Cardoso Leitão
I went through it, and I have to say that it is really well written and contains non-trivial knowledge about the arrow crate. Thank you very much for this, Fernando. In my opinion alone, the guide or a variation of it could be incorporated into the arrow repo and released together with the crate,

Re: [Rust] Proposed PR Merge Guidelines

2021-01-29 Thread Jorge Cardoso Leitão
Hi, Thanks a lot for writing. I like it very much. Some follow ups / clarifications: Do we differentiate between who PRed? I.e. if it is a committer, do they count for the approval or in that case do we need 2 approvals? I am more favourable to require a second approval as usual, with the idea

Re: [RUST] Implement value function with Array trait

2021-01-28 Thread Jorge Cardoso Leitão
to be implemented for all the possible values that > > can be returned from the arrays > > > > Fernando > > > > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, < > jorgecarlei...@gmail.com > > > > > wrote: > > > > > Hi Fernando, > >

Re: [RUST] Implement value function with Array trait

2021-01-28 Thread Jorge Cardoso Leitão
Hi Fernando, I tried that some time ago, but I was unable to do so. The reason is that Array is a trait that needs to support also being a trait object (i.e. support ` Array`). Let's try here: what type should `Array::value` return? One option is to make Array a generic. But if Array is a

Re: [Rust] Proposed discussion items for the Rust sync up meeting this Wednesday Jan 27, 2021

2021-01-27 Thread Jorge Cardoso Leitão
Hi, Thanks a lot for driving this, Andrew. I agree with what both of you wrote here :) Best, Jorge On Tue, Jan 26, 2021 at 6:06 PM Rémi Dettai wrote: > Great topics Andrew, to my knowledge nothing has been decided on these > topics. > > We also agreed last time that it would be nice to go

[Rust] Recent backward incompatible changes in master

2021-01-19 Thread Jorge Cardoso Leitão
Hi, Just a heads up that there are 4 changes in master that may affect your code, and I would like to bring them to your attention. 1. `memory::allocate_aligned` no longer initializes memory regions with zeros. Use `memory::allocate_aligned_zeroed` for this. Rational: up to 3.0, we allocate a

Re: How to invite a speaker

2021-01-16 Thread Jorge Cardoso Leitão
Hi Kirill, If I was to invite someone, I would reach out to the person individually, via email or linkedin. Best, Jorge On Fri, Jan 15, 2021, 23:16 Kirill Lykov wrote: > Hi, > > I wonder where to write if we want to invite some speakers from Arrow to > give a talk for software engineers

Re: Github Actions feedback time

2021-01-09 Thread Jorge Cardoso Leitão
the best > path would be to attach a few bare-metal machines or spin up some > persistent cloud instances (if we figure out how they're going to be > paid for). > > On Thu, Jan 7, 2021 at 2:15 PM Jorge Cardoso Leitão > wrote: > > > > Hi Wes, > > > > T

Re: Github Actions feedback time

2021-01-07 Thread Jorge Cardoso Leitão
achine. > > On Wed, Jan 6, 2021 at 11:29 PM Jorge Cardoso Leitão > wrote: > > > > Hi Jacob, > > > > Neal already requested those (and other) actions to be whitelisted here > > https://issues.apache.org/jira/browse/INFRA-21239, but there was no > > resp

Re: Github Actions feedback time

2021-01-06 Thread Jorge Cardoso Leitão
Hi Jacob, Neal already requested those (and other) actions to be whitelisted here https://issues.apache.org/jira/browse/INFRA-21239, but there was no response yet. Best, Jorge On Thu, Jan 7, 2021 at 6:20 AM Jacob Quinn wrote: > From this page, it looks like there have been certain github

Re: Github Actions feedback time

2021-01-05 Thread Jorge Cardoso Leitão
the > cloud bill, too). > > The one thing to be aware of with BK is the possible security risks > from builds running on pull requests. With reasonable security > practices (not putting sensitive data on the build hosts) this should > be too much of a problem. > > On Tue,

Re: Github Actions feedback time

2021-01-05 Thread Jorge Cardoso Leitão
Krisztián, I agree with you that there is an ongoing problem with the queue. Thanks for the tips wrt to WIP. Wes, I would be up for moving Rust workflows to Buildkite. Is the integration in place wrt to reporting to triggers and reporting back to gihtub? I.e. can we just place a `pipeline.yml`

Re: Github Actions feedback time

2021-01-05 Thread Jorge Cardoso Leitão
Hi Krisztian, Could you describe which flows you are referring to? Rust builds are taking 4-6m on non-(windows/mac) and 8-15m on windows/mac, with almost immediate feedback. E.g. I force-pushed recently https://github.com/apache/arrow/pull/9089 and the first complete test came 5m later. One

Re: GitHub Actions are currently blocked

2020-12-28 Thread Jorge Cardoso Leitão
Thank you so much Neal for bringing this up. I was merging PRs without realizing about this change. :/ Fun stuff, great timing with the release coming up. A quick fix for now is to check the corresponding branch on the user's fork. For example, this PR

Re: [Rust] Heads up -- change to the Rust CI system

2020-12-08 Thread Jorge Cardoso Leitão
d an overhaul > of the Rust CI system. > > You can see more details and rationale on the PR: > https://github.com/apache/arrow/pull/8821 but I wanted to bring it to the > attention of a wider audience as the PR checks will look different for Rust > PRs. > > Thanks to @Jorge Cardoso Leitão for driving > this work > > Andrew >

Re: PR backlog queue

2020-12-06 Thread Jorge Cardoso Leitão
Hi, On the rust side, which has about 35 open, there have been more PRs coming up these days, but they are all triaged and most are progressing, so, I think that we are nominal there. Best, Jorge On Mon, Dec 7, 2020 at 6:48 AM Micah Kornfield wrote: > It looks like the PR queue is at 126.

<    1   2   3   >