Re: Arrow as a streaming format

2020-09-09 Thread Fan Liya
+1 for introducing Arrow in streaming processing, as we have made some attempts on this. IMO, the metadata overhead is not likely to be a problem. If the streaming data is having a high arriving rate, we can compensate for this with a large batch size without impacting the response time, while if

Re: Adding Parquet encryption support to PyArrow

2020-09-09 Thread Gidon Gershinsky
Thanks guys. I'll go over the intro sections to merge/streamline the text there. I've added a "commenter" access for all, so everybody could take part in the doc's discussion threads. For edit access, please contact Itamar (by pressing the request button). Cheers, Gidon On Wed, Sep 9, 2020 at

Re: Adding Parquet encryption support to PyArrow

2020-09-09 Thread Roee Shlomo
Hi Itamar, Thanks for starting the document. I've added an initial draft version of the API (parts of it at least). I have also added problem statement and goals sections to list what I understand that we want to achieve. On 2020/09/08 17:44:07, "Itamar Turner-Trauring" wrote: > Still

[NIGHTLY] Arrow Build Report for Job nightly-2020-09-09-0

2020-09-09 Thread Crossbow
Arrow Build Report for Job nightly-2020-09-09-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-09-0 Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-09-0-github-centos-6-amd64 -

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-09 Thread Fan Liya
Hi all, Thanks a lot for your previous feedback. Now we have made some investigation and prepared an initial PR supporting the non-nullable IntVector [1], as this represents a common scenario. Some initial observations and conclusions can be made. The basic idea of the PR is to provide a global