Hi Micah &Yibo: I test the bandwidth between US and Asia server today. The bandwidth around 50-70Mbits/s.
175MB (non compress) US <-> Asia will take around 15s 1.78GB (non compress) US <-> Asia will take around 170~180s 175MB (compress) US <-> Asia will take around 4~8s 1.78GB (compress) US <-> Asia will take around 50~70s Do you think that is accountable? If I want to reach the performance 2-3GB/s , how many bandwidths do you think we need to have? Many Thanks, Abe On 2021/08/19 18:07:57, Micah Kornfield <[email protected]> wrote: > I don't think the performance numbers accounted for high latency links.> > What bandwidth link do you have between the two servers? YOu might try> > using compression in Arrow.> > > On Thu, Aug 19, 2021 at 10:40 AM Abe Hsu 許 育銘 (abehsu) <[email protected]>> > wrote:> > > > Micron Confidential> > >> > >> > > Hi team:> > >> > > I am Abe from Taiwan. This is my first time sent mail to apache community,> > > if i do something wrong, please correct me. I am investigating using > > Arrow> > > Flight as data exchange protocol. I am using python to establish a Flight> > > Server. And the performance is a little not as my expectation, so I would> > > like to ask some suggestion from team. I set up Flight Server on US, and > > my> > > python client code is setup on Asia (e.g: Taiwan).> > >> > > I find if I want to transfer 178MB data with 1001730 rows from US to Asia.> > > It will need 10s. I expect it will less than 1s?> > > Any parts I am missing?> > >> > >> > >> > > time python client.py get -c ‘get’> > >> > >> > >> > >> > >> > > RangeIndex: 1001731 entries, 0 to 1001730> > >> > > Data columns (total 16 columns):> > >> > > # Column Non-Null Count Dtype> > >> > > --- ------ -------------- -----> > >> > > 0 cmte_id 1001731 non-null object> > >> > > 1 cand_id 1001731 non-null object> > >> > > 2 cand_nm 1001731 non-null object> > >> > > 3 contbr_nm 1001731 non-null object> > >> > > 4 contbr_city 1001712 non-null object> > >> > > 5 contbr_st 1001727 non-null object> > >> > > 6 contbr_zip 1001731 non-null int64> > >> > > 7 contbr_employer 988002 non-null object> > >> > > 8 contbr_occupation 993301 non-null object> > >> > > 9 contb_receipt_amt 1001731 non-null float64> > >> > > 10 contb_receipt_dt 1001731 non-null object> > >> > > 11 receipt_desc 14166 non-null object> > >> > > 12 memo_cd 92482 non-null object> > >> > > 13 memo_text 97770 non-null object> > >> > > 14 form_tp 1001731 non-null object> > >> > > 15 file_num 1001731 non-null int64> > >> > > dtypes: float64(1), int64(2), object(13)> > >> > > memory usage: 122.3+ MB> > >> > >> > >> > >> > >> > > real 0m10.405s> > >> > > user 0m0.297s> > >> > > sys 0m0.996s> > >> > >> > >> > > I will have this expectation is because I look into those articles.> > >> > > · https://www.dremio.com/is-time-to-replace-odbc-jdbc> > >> > > With an average size batch size (256K records), the performance of Flight> > > exceeded 20 Gb/s for a single stream running on a single core.> > >> > > ·> > > https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/> > >> > > As far as absolute speed, in our C++ data throughput benchmarks, we are> > > seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without> > > TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in> > > about 4 seconds:> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > Many Thanks,> > >> > > Abe> > >> > >> > >> > > Micron Confidential> > >> >
