I don't think the performance numbers accounted for high latency links. What bandwidth link do you have between the two servers? YOu might try using compression in Arrow.
On Thu, Aug 19, 2021 at 10:40 AM Abe Hsu 許 育銘 (abehsu) <[email protected]> wrote: > Micron Confidential > > > Hi team: > > I am Abe from Taiwan. This is my first time sent mail to apache community, > if i do something wrong, please correct me. I am investigating using Arrow > Flight as data exchange protocol. I am using python to establish a Flight > Server. And the performance is a little not as my expectation, so I would > like to ask some suggestion from team. I set up Flight Server on US, and my > python client code is setup on Asia (e.g: Taiwan). > > I find if I want to transfer 178MB data with 1001730 rows from US to Asia. > It will need 10s. I expect it will less than 1s? > Any parts I am missing? > > > > time python client.py get -c ‘get’ > > > > > > RangeIndex: 1001731 entries, 0 to 1001730 > > Data columns (total 16 columns): > > # Column Non-Null Count Dtype > > --- ------ -------------- ----- > > 0 cmte_id 1001731 non-null object > > 1 cand_id 1001731 non-null object > > 2 cand_nm 1001731 non-null object > > 3 contbr_nm 1001731 non-null object > > 4 contbr_city 1001712 non-null object > > 5 contbr_st 1001727 non-null object > > 6 contbr_zip 1001731 non-null int64 > > 7 contbr_employer 988002 non-null object > > 8 contbr_occupation 993301 non-null object > > 9 contb_receipt_amt 1001731 non-null float64 > > 10 contb_receipt_dt 1001731 non-null object > > 11 receipt_desc 14166 non-null object > > 12 memo_cd 92482 non-null object > > 13 memo_text 97770 non-null object > > 14 form_tp 1001731 non-null object > > 15 file_num 1001731 non-null int64 > > dtypes: float64(1), int64(2), object(13) > > memory usage: 122.3+ MB > > > > > > real 0m10.405s > > user 0m0.297s > > sys 0m0.996s > > > > I will have this expectation is because I look into those articles. > > · https://www.dremio.com/is-time-to-replace-odbc-jdbc > > With an average size batch size (256K records), the performance of Flight > exceeded 20 Gb/s for a single stream running on a single core. > > · > https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/ > > As far as absolute speed, in our C++ data throughput benchmarks, we are > seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without > TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in > about 4 seconds: > > > > > > > > > > Many Thanks, > > Abe > > > > Micron Confidential >
