Hi Micah &Yibo:

I test the bandwidth between US and Asia server today. The bandwidth around 
50-70Mbits/s. 

175MB (non compress)
US <-> Asia will take around 15s

1.78GB (non compress)
US <-> Asia will take around 170~180s

175MB (compress)
US <-> Asia will take around 4~8s

1.78GB (compress)
US <-> Asia will take around 50~70s


Do you think that is accountable?  
If I want to reach the performance 2-3GB/s , how many bandwidths do you think 
we need to have?

Many Thanks,
Abe


On 2021/08/19 18:07:57, Micah Kornfield <[email protected]> wrote: 
> I don't think the performance numbers accounted for high latency links.> 
> What bandwidth link do you have between the two servers?  YOu might try> 
> using compression in Arrow.> 
> 
> On Thu, Aug 19, 2021 at 10:40 AM Abe Hsu 許 育銘 (abehsu) <[email protected]>> 
> wrote:> 
> 
> > Micron Confidential> 
> >> 
> >> 
> > Hi team:> 
> >> 
> > I am Abe from Taiwan. This is my first time sent mail to apache community,> 
> > if i do something wrong, please correct me.  I am investigating using 
> > Arrow> 
> > Flight as data exchange protocol. I am using python to establish a Flight> 
> > Server. And the performance is a little not as my expectation, so I would> 
> > like to ask some suggestion from team. I set up Flight Server on US, and 
> > my> 
> > python client code is setup on Asia (e.g: Taiwan).> 
> >> 
> > I find if I want to transfer 178MB data with 1001730 rows from US to Asia.> 
> > It will need 10s. I expect it will less than 1s?> 
> > Any parts I am missing?> 
> >> 
> >> 
> >> 
> > time python client.py get -c ‘get’> 
> >> 
> >> 
> >> 
> >> 
> >> 
> > RangeIndex: 1001731 entries, 0 to 1001730> 
> >> 
> > Data columns (total 16 columns):> 
> >> 
> > #   Column             Non-Null Count    Dtype> 
> >> 
> > ---  ------             --------------    -----> 
> >> 
> > 0   cmte_id            1001731 non-null  object> 
> >> 
> > 1   cand_id            1001731 non-null  object> 
> >> 
> > 2   cand_nm            1001731 non-null  object> 
> >> 
> > 3   contbr_nm          1001731 non-null  object> 
> >> 
> > 4   contbr_city        1001712 non-null  object> 
> >> 
> > 5   contbr_st          1001727 non-null  object> 
> >> 
> > 6   contbr_zip         1001731 non-null  int64> 
> >> 
> > 7   contbr_employer    988002 non-null   object> 
> >> 
> > 8   contbr_occupation  993301 non-null   object> 
> >> 
> > 9   contb_receipt_amt  1001731 non-null  float64> 
> >> 
> > 10  contb_receipt_dt   1001731 non-null  object> 
> >> 
> > 11  receipt_desc       14166 non-null    object> 
> >> 
> > 12  memo_cd            92482 non-null    object> 
> >> 
> > 13  memo_text          97770 non-null    object> 
> >> 
> > 14  form_tp            1001731 non-null  object> 
> >> 
> > 15  file_num           1001731 non-null  int64> 
> >> 
> > dtypes: float64(1), int64(2), object(13)> 
> >> 
> > memory usage: 122.3+ MB> 
> >> 
> >> 
> >> 
> >> 
> >> 
> > real  0m10.405s> 
> >> 
> > user 0m0.297s> 
> >> 
> > sys   0m0.996s> 
> >> 
> >> 
> >> 
> > I will have this expectation is because I look into those articles.> 
> >> 
> > ·         https://www.dremio.com/is-time-to-replace-odbc-jdbc> 
> >> 
> > With an average size batch size (256K records), the performance of Flight> 
> > exceeded 20 Gb/s for a single stream running on a single core.> 
> >> 
> > ·> 
> > https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/> 
> >> 
> > As far as absolute speed, in our C++ data throughput benchmarks, we are> 
> > seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without> 
> > TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in> 
> > about 4 seconds:> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> > Many Thanks,> 
> >> 
> > Abe> 
> >> 
> >> 
> >> 
> > Micron Confidential> 
> >> 
> 

Reply via email to