RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Melik-Adamyan, Areg
Hi, We are talking about the same thing actually, but you do not want to use 3rd party tools. For 3 and 4 - you run the first version store in 1.out, then second version store in 2.out and run compare tool. Your tool does two steps automatically, that is fine. > Various reason why I think th

Re: [DISCUSS] 64-bit offset variable width types (i.e.Large List, Last String, Large bytes)

2019-04-24 Thread Micah Kornfield
Given that conversation seems to have died down on this, would it make sense to do a vote to allow for large variable width types to be added? As discussed previously PRs would need both C++ and Java implementation before being merged. Could a PMC member facilitate this? Philipp if approved, do

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Francois Saint-Jacques
Hello, archery is the "shim" scripts that glue some of the steps (2-4) that you described. It builds arrow (c++ for now), find the multiple benchmark binaries, runs them, and collects the outputs. I encourage you to check the implementation, notably [1] and [2] (and generally [3]). Think of it as

Use arrow as a general data serialization framework in distributed stream data processing

2019-04-24 Thread Shawn Yang
Motivate We want to use arrow as a general data serialization framework in distributed stream data processing. We are working on ray , written in c++ in low-level and java/python in high-level. We want to transfer streaming data between java/python/c++ efficient

回复:contributor permmission

2019-04-24 Thread niki.lj
Thanks very much! -- 发件人:Wes McKinney 发送时间:2019年4月25日(星期四) 06:13 收件人:dev ; niki.lj 主 题:Re: contributor permmission Added On Tue, Apr 23, 2019 at 9:29 PM niki.lj wrote: > > > Hi, > could you please give me the contributor permiss

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Melik-Adamyan, Areg
Wes, The process as I think should be the following. 1. Commit triggers to build in TeamCity. I have set the TeamCity, but we can use whatever CI we would like. 2. TeamCity is using the pool of identical machines to run the predefined (or all) performance benchmarks on one the build machines fro

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Melik-Adamyan, Areg
Sebastien - yes, Go has very advanced, but simple performance benchmarking tooling. My intention is to reuse as much as we can. -Original Message- From: Sebastien Binet [mailto:bi...@cern.ch] Sent: Wednesday, April 24, 2019 11:09 AM To: dev@arrow.apache.org Subject: Re: Benchmarking mail

Re: contributor permmission

2019-04-24 Thread Wes McKinney
Added On Tue, Apr 23, 2019 at 9:29 PM niki.lj wrote: > > > Hi, > could you please give me the contributor permission, I want to contribute to > Arrow, thanks! > My apache account is tianchen92 > > > Ji Liu

[jira] [Created] (ARROW-5212) Array BinaryBuilder in Go library has no access to resize the values buffer

2019-04-24 Thread Jonathan A Sternberg (JIRA)
Jonathan A Sternberg created ARROW-5212: --- Summary: Array BinaryBuilder in Go library has no access to resize the values buffer Key: ARROW-5212 URL: https://issues.apache.org/jira/browse/ARROW-5212

[jira] [Created] (ARROW-5211) Missing documentation under `Dictionary encoding` section on MetaData page

2019-04-24 Thread Lennox Stevenson (JIRA)
Lennox Stevenson created ARROW-5211: --- Summary: Missing documentation under `Dictionary encoding` section on MetaData page Key: ARROW-5211 URL: https://issues.apache.org/jira/browse/ARROW-5211 Projec

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Wes McKinney
In the benchmarking one of the hardest parts (IMHO) is the process/workflow automation. I'm in support of the development of a "meta-benchmarking" framework that offers automation, extensibility, and possibility for customization. One of the reasons that people don't do more benchmarking as part o

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Sebastien Binet
On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou wrote: > > Hi Areg, > > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit : > > Because we are using Google Benchmark, which has specific format there > is a tool called becnhcmp which compares two runs: > > > > $ benchcmp old.txt new.txt > > bench

Re: [Discuss] Benchmarking infrastructure

2019-04-24 Thread Francois Saint-Jacques
No worries, I'll update the PR to refactor this cli function in a re-usable function. Luckily it's small enough and not too much logic is leaking. On Tue, Apr 23, 2019 at 4:24 PM Wes McKinney wrote: > hi Francois, > > This sounds like good progress. > > For any tool consumable through a CLI/com

[jira] [Created] (ARROW-5210) [Python] editable install (pip install -e .) is failing

2019-04-24 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5210: Summary: [Python] editable install (pip install -e .) is failing Key: ARROW-5210 URL: https://issues.apache.org/jira/browse/ARROW-5210 Project: Apache

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Antoine Pitrou
Hi Areg, Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit : > Because we are using Google Benchmark, which has specific format there is a > tool called becnhcmp which compares two runs: > > $ benchcmp old.txt new.txt > benchmark old ns/op new ns/op delta > BenchmarkConcat

[jira] [Created] (ARROW-5209) [Java] Add performance benchmarks from SQL workloads

2019-04-24 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5209: --- Summary: [Java] Add performance benchmarks from SQL workloads Key: ARROW-5209 URL: https://issues.apache.org/jira/browse/ARROW-5209 Project: Apache Arrow Issue Type: I

[jira] [Created] (ARROW-5208) Inconsistent resulting type during casting in pa.array() when mask is present

2019-04-24 Thread Artem KOZHEVNIKOV (JIRA)
Artem KOZHEVNIKOV created ARROW-5208: Summary: Inconsistent resulting type during casting in pa.array() when mask is present Key: ARROW-5208 URL: https://issues.apache.org/jira/browse/ARROW-5208 P