Hi Micah &Yibo:

I test the bandwidth between US and Asia server today. The bandwidth around 
50-70Mbits/s.

175MB (non compress)
US <-> Asia will take around 15s

1.78GB (non compress)
US <-> Asia will take around 170~180s

175MB (compress)
US <-> Asia will take around 4~8s

1.78GB (compress)
US <-> Asia will take around 50~70s


Do you think that is accountable?
If I want to reach the performance 2-3GB/s , how many bandwidths do you think 
we need to have?

Many Thanks,
Abe



On 2021/08/19 18:07:57, Micah Kornfield <[email protected]<http://gmail.com>> 
wrote:
> I don't think the performance numbers accounted for high latency links.>
> What bandwidth link do you have between the two servers?  YOu might try>
> using compression in Arrow.>
>
> On Thu, Aug 19, 2021 at 10:40 AM Abe Hsu 許 育銘 (abehsu) 
> <[email protected]<http://micron.com>>>
> wrote:>
>
> > Micron Confidential>
> >>
> >>
> > Hi team:>
> >>
> > I am Abe from Taiwan. This is my first time sent mail to apache community,>
> > if i do something wrong, please correct me.  I am investigating using Arrow>
> > Flight as data exchange protocol. I am using python to establish a Flight>
> > Server. And the performance is a little not as my expectation, so I would>
> > like to ask some suggestion from team. I set up Flight Server on US, and my>
> > python client code is setup on Asia (e.g: Taiwan).>
> >>
> > I find if I want to transfer 178MB data with 1001730 rows from US to Asia.>
> > It will need 10s. I expect it will less than 1s?>
> > Any parts I am missing?>
> >>
> >>
> >>
> > time python client.py get -c ‘get’>
> >>
> >>
> >>
> >>
> >>
> > RangeIndex: 1001731 entries, 0 to 1001730>
> >>
> > Data columns (total 16 columns):>
> >>
> > #   Column             Non-Null Count    Dtype>
> >>
> > ---  ------             --------------    ----->
> >>
> > 0   cmte_id            1001731 non-null  object>
> >>
> > 1   cand_id            1001731 non-null  object>
> >>
> > 2   cand_nm            1001731 non-null  object>
> >>
> > 3   contbr_nm          1001731 non-null  object>
> >>
> > 4   contbr_city        1001712 non-null  object>
> >>
> > 5   contbr_st          1001727 non-null  object>
> >>
> > 6   contbr_zip         1001731 non-null  int64>
> >>
> > 7   contbr_employer    988002 non-null   object>
> >>
> > 8   contbr_occupation  993301 non-null   object>
> >>
> > 9   contb_receipt_amt  1001731 non-null  float64>
> >>
> > 10  contb_receipt_dt   1001731 non-null  object>
> >>
> > 11  receipt_desc       14166 non-null    object>
> >>
> > 12  memo_cd            92482 non-null    object>
> >>
> > 13  memo_text          97770 non-null    object>
> >>
> > 14  form_tp            1001731 non-null  object>
> >>
> > 15  file_num           1001731 non-null  int64>
> >>
> > dtypes: float64(1), int64(2), object(13)>
> >>
> > memory usage: 122.3+ MB>
> >>
> >>
> >>
> >>
> >>
> > real  0m10.405s>
> >>
> > user 0m0.297s>
> >>
> > sys   0m0.996s>
> >>
> >>
> >>
> > I will have this expectation is because I look into those articles.>
> >>
> > ·         https://www.dremio.com/is-time-to-replace-odbc-jdbc>
> >>
> > With an average size batch size (256K records), the performance of Flight>
> > exceeded 20 Gb/s for a single stream running on a single core.>
> >>
> > ·>
> > https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/>
> >>
> > As far as absolute speed, in our C++ data throughput benchmarks, we are>
> > seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without>
> > TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in>
> > about 4 seconds:>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> > Many Thanks,>
> >>
> > Abe>
> >>
> >>
> >>
> > Micron Confidential>
> >>
>

Reply via email to