Re: Python Flight example with query command

2021-05-17 Thread Tanveer Ahmad - EWI
Hi David, Thank you for the reply. I have found that Arrow Datafusion project offers something similar for what I am looking for. Do you think this project

Nightly Builds Repors 2021-05-17

2021-05-17 Thread Mauricio Vargas
*NIGHTLY BUILDS REPORT* 2021-05-17 *New reported errors* *GitHub* *Build: *github-test-conda-python-3.8-spark-master Error type: Internal Progress: No work has yet been done on this issue. First time issued:

Language Silos and transpilers

2021-05-17 Thread Arun Sharma
Hello: I just watched a video about Apache Arrow ( https://www.youtube.com/watch?v=-ZikPi2nmSI) that discussed Language Silos and one of the questions towards the end was about being able to translate automatically from one language to another. I'm not aware of the specific requirements for one

Re: Long title on github page

2021-05-17 Thread Weston Pace
I'd avoid the word "structured" as it is somewhat ill-defined. On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas wrote: > > more marketed: > How about: "Apache Arrow is a format and language-agnostic library focused > on efficient sharing and processing of structured data." > > On Mon, May 17,

Re: Long title on github page

2021-05-17 Thread Mauricio Vargas
more marketed: How about: "Apache Arrow is a format and language-agnostic library focused on efficient sharing and processing of structured data." On Mon, May 17, 2021 at 6:25 PM Micah Kornfield wrote: > How about: "Apache Arrow is a collection of specifications, cross language > libraries and

Re: Long title on github page

2021-05-17 Thread Micah Kornfield
How about: "Apache Arrow is a collection of specifications, cross language libraries and applications focused on efficient sharing and processing of structured data." On Mon, May 17, 2021 at 3:06 PM Wes McKinney wrote: > On Mon, May 17, 2021 at 4:58 PM Weston Pace wrote: > > > > > “Apache

Re: Long title on github page

2021-05-17 Thread Wes McKinney
On Mon, May 17, 2021 at 4:58 PM Weston Pace wrote: > > > “Apache Arrow is a format and compute kernel for in-memory data” > > I like this but no one ever knows what "in-memory" means (or they just > think 'data is always in memory'). How about... > > "Apache Arrow is a format and compute kernel

Re: Long title on github page

2021-05-17 Thread Weston Pace
> “Apache Arrow is a format and compute kernel for in-memory data” I like this but no one ever knows what "in-memory" means (or they just think 'data is always in memory'). How about... "Apache Arrow is a format and compute kernel for zero-copy processing and sharing of data." or... "Apache

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Krisztián Szűcs
On Mon, May 17, 2021 at 9:05 PM Neal Richardson wrote: > > How does one get their key in the Web of Trust? We do need to be able to > add people to that so that it's not just the same handful of individuals > who can be release manager, and now seems like a great time to add Jorge. Totally agree,

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Wes McKinney
I would suggest Krisztian or someone in the web of trust have a video call with Jorge to confirm his identity (and GPG fingerprint) and then commit his code signing key to KEYS. I don't think it's necessary to be extremely paranoid about this. On Mon, May 17, 2021 at 2:06 PM Neal Richardson

Re: Long title on github page

2021-05-17 Thread Mauricio Vargas
a few ideas github.com/apache/arrow - Apache Arrow is an efficient library for big data processing and sharing github.com/apache/arrow - Apache Arrow is a computational tool for processing, storing and sharing large datasets github.com/apache/arrow - Apache Arrow is a fast and simple library

Re: Long title on github page

2021-05-17 Thread Julian Hyde
Alright, well, whatever it is, it must fit into one breath. If the high-concept pitch is successful, people will stick around for the full pitch. Words such as “platform” and “enable” are noise. You say “platform”, they start to say “what exactly do you mean by platform”, the elevator doors

Re: Long title on github page

2021-05-17 Thread Adam Lippai
Hi, I'm 100% behind Wes. Being not just a file format, but adding compute and libs are the best selling points of Arrow. It shouldn't be reduced to "a file format and it's utils", as the ecosystem is at least that important. This is something we have to emphasize constantly. Best regards, Adam

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Neal Richardson
How does one get their key in the Web of Trust? We do need to be able to add people to that so that it's not just the same handful of individuals who can be release manager, and now seems like a great time to add Jorge. Neal On Mon, May 17, 2021 at 11:52 AM Krisztián Szűcs wrote: > I think

Re: Long title on github page

2021-05-17 Thread Eduardo Ponce
One more suggestion for the bucket: "Apache Arrow is a computational platform for efficient in-memory data representation and processing." On Mon, May 17, 2021 at 2:49 PM Wes McKinney wrote: > I think less is better in the description, but unfortunately the > association of Arrow as being "just

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Krisztián Szűcs
I think your GPG key hasn't been configured yet, at least it is not in the KEYS file [1]. The source release tarball must be signed by the release manager. Do you have an Apache Code Signing key? If not, then it could be better if either Kou or I would be the release manager. [1]:

Re: Long title on github page

2021-05-17 Thread Wes McKinney
I think less is better in the description, but unfortunately the association of Arrow as being "just a data format" has been actively harmful in some ways to community growth. We have a data format, yes, but we are also creating a computational platform to go hand-in-hand with the data format to

Re: Long title on github page

2021-05-17 Thread Mauricio Vargas
sorry to come with a marketing-style title, but how about github.com/apache/arrow - Apache Arrow is an efficient format for big data processing and sharing ? On Mon, May 17, 2021 at 1:15 PM Julian Hyde wrote: > I think that the “cross-language development platform for” is noise. (I’m > sure

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Micah Kornfield
Small logistical question. Jorge do you have a PGP key in the Apache Web of Trust [1] [1] https://infra.apache.org/release-signing.html#web-of-trust On Mon, May 17, 2021 at 11:46 AM Krisztián Szűcs wrote: > On Mon, May 17, 2021 at 8:30 PM Jorge Cardoso Leitão > wrote: > > > > Thanks,

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Krisztián Szűcs
On Mon, May 17, 2021 at 8:30 PM Jorge Cardoso Leitão wrote: > > Thanks, Krisztián! > > I saw that ARROW-12769 and ARROW-12619 were also just cherry-picked, so we > are 2 to go: > > - https://issues.apache.org/jira/browse/ARROW-12604 Resolved now, but didn't require a patch on our side. > -

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Jorge Cardoso Leitão
Thanks, Krisztián! I saw that ARROW-12769 and ARROW-12619 were also just cherry-picked, so we are 2 to go: - https://issues.apache.org/jira/browse/ARROW-12604 - https://issues.apache.org/jira/browse/ARROW-12603 Best, Jorge On Mon, May 17, 2021 at 1:42 PM Krisztián Szűcs wrote: > On Sat,

Re: Long title on github page

2021-05-17 Thread Julian Hyde
I think that the “cross-language development platform for” is noise. (I’m sure that JPEG developers think that JPEG is a “cross-language development platform” too. But it isn’t. It is an image format.) "Apache Arrow is data format for efficient in-memory processing.” I’ll note that In

Re: String reverse kernel

2021-05-17 Thread Jonathan Keane
Yeah, piggybacking on what Weston said: is the line that we want to draw is code point, combining character sequences, or graphemes [1]. IME, most people would want/assume that combining characters would stay combined in reversals (using Weston's example: "tréma" becoming "aḿert" (though this

Re: String reverse kernel

2021-05-17 Thread Weston Pace
FWIW, combining marks were not actually added to support emojis. Emojis are just one of the more popular uses of the feature. Combining marks is a standard Unicode feature necessary to represent single “characters” in some complex situations (e.g. when it is necessary to distinguish between

Re: String reverse kernel

2021-05-17 Thread Niranda Perera
Thank you very much for your inputs, guys. So, based on the discussion, I will make the following changes. 1. ASCII reverse would throw an error when a non-ASCII (valid/ invalid utf8) byte is oThank you @antoinebserved (no change) 2. UTF8 kernel would return a garbage output when an invalid utf8

Re: String reverse kernel

2021-05-17 Thread Antoine Pitrou
I'm fine with pointing out that the function operates on codepoints. Linking to the Unicode documentation for emojis sounds entirely like a distraction, though. Regards Antoine. Le 17/05/2021 à 17:28, Ian Cook a écrit : +1 for clarifying this in the kernel documentation, referring to

Re: String reverse kernel

2021-05-17 Thread Ian Cook
+1 for clarifying this in the kernel documentation, referring to these multi-emoji glyphs as "emoji ZWJ sequences," and linking to https://unicode.org/emoji/charts/emoji-zwj-sequences.html Ian On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou wrote: > > > Le 17/05/2021 à 17:17, David Li a écrit

Re: String reverse kernel

2021-05-17 Thread David Li
Sure, that is a fair point. But in this case Unicode defines both codepoint and (extended) grapheme cluster, so I felt it might be worth including a quick note about which one is being reversed (though to be fair, nearly every language picks codepoint except maybe Swift, IIUC). In either case

Re: [DISCUSS] Parquet/Arrow/Flight as distributed persistence service

2021-05-17 Thread Gary Pennington
Hi David, Thanks for the feedback. I’m re-assured that you don’t think the idea is too crazy.  I’ll take a look at the FlightSQL proposal you mention. There is actually a related project to the one I’m working on which will need a more structured approach for data storage. Maybe not SQL

Re: String reverse kernel

2021-05-17 Thread Antoine Pitrou
Le 17/05/2021 à 17:17, David Li a écrit : A little clarification on my point: it's not that a single codepoint gets encoded with more than four bytes, it's that a grapheme cluster/human-delimited 'character' might be multiple codepoints, so reversing the individual codepoints may produce an

Re: String reverse kernel

2021-05-17 Thread David Li
A little clarification on my point: it's not that a single codepoint gets encoded with more than four bytes, it's that a grapheme cluster/human-delimited 'character' might be multiple codepoints, so reversing the individual codepoints may produce an unexpected result. For instance a flag emoji is

Re: String reverse kernel

2021-05-17 Thread Antoine Pitrou
Le 17/05/2021 à 16:28, Niranda Perera a écrit : Hi all, This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly trivial exercise, I would like to clarify a few things. In the current PR [1], there are 2 reverse kernels, ASCII and UTF8. I'd like to get some feedback for the

Re: Long title on github page

2021-05-17 Thread Eduardo Ponce
I agree with Nate's and Brian's suggestions, but would like to add that we can make it a one-liner for more conciseness and consistency with other Apache projects. Apologies if it seems I am going around the suggestions loop again. "Apache Arrow is a cross-language development platform enabling

String reverse kernel

2021-05-17 Thread Niranda Perera
Hi all, This is RE: [1] & [2] String reverse kernel. Even though it is a seemingly trivial exercise, I would like to clarify a few things. In the current PR [1], there are 2 reverse kernels, ASCII and UTF8. I'd like to get some feedback for the following points. 1. For ASCII reverse, I am

Re: [DISCUSS] Parquet/Arrow/Flight as distributed persistence service

2021-05-17 Thread David Li
Hey Gary, Sounds like an interesting project! To speak a bit to the Flight question: I don't think you need a new action; using DoGet/DoPut as you describe makes sense for persistence. There's no required semantics for Flight - it certainly suggests certain patterns (GetFlightInfo -> DoGet for

Re: Long title on github page

2021-05-17 Thread Brian Hulette
Thank you for bringing this up Dominik. I sampled some of the descriptions for other Apache projects I frequent, the ones with a meaningful description have a single sentence: github.com/apache/spark - Apache Spark - A unified analytics engine for large-scale data processing

[RUST] Request for Comment / Check proposed release process

2021-05-17 Thread Andrew Lamb
I need help verifying the proposed source tarball format for the Arrow Rust releases; Specifically, can someone please: 1. Download the example files and ensure they can successfully validate the signatures 2. Ensure that the contents of this tarball could be used to publish to crates.io

[DISCUSS] Parquet/Arrow/Flight as distributed persistence service

2021-05-17 Thread Gary Pennington
Hi, (NB: I first floated this question in the arrow-rust slack channel and Jorge Leitao suggested I should ask here.) I’m cranking up a project to provide functionality based on: parquet/arrow/flight implemented in rust. The primary goals of the project are to provide a mechanism for

Re: Long title on github page

2021-05-17 Thread Wes McKinney
It's probably best for description to limit mentions of specific features. There are some high level features mentioned in the description now ("computational libraries and zero-copy streaming messaging and interprocess communication"), but now in 2021 since the project has grown so much, it could

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread David Li
I'll provide a backport for ARROW-12603 - it's a duplicate of another issue but the change there would pull in a lot of unrelated changes. Best, David On 2021/05/17 11:42:11, Krisztián Szűcs wrote: > On Sat, May 15, 2021 at 7:44 AM Jorge Cardoso Leitão > wrote: > > > > Hi, > > > > I have

Re: [DISCUSS] 4.0.1 patch release?

2021-05-17 Thread Krisztián Szűcs
On Sat, May 15, 2021 at 7:44 AM Jorge Cardoso Leitão wrote: > > Hi, > > I have started collecting commits to the maint branch [1]. The exact > commands I used: > > git clone g...@github.com:apache/arrow.git > cd arrow/dev > python3 -m venv venv > source venv/bin/activate > pip install -e archery

[NIGHTLY] Arrow Build Report for Job nightly-2021-05-17-0

2021-05-17 Thread Crossbow
Arrow Build Report for Job nightly-2021-05-17-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-17-0 Failed Tasks: - conda-osx-clang-py36-r36: URL: