Micron Confidential

Hi team:


I am Abe from Taiwan. This is my first time sent mail to apache community, if i 
do something wrong, please correct me.  I am investigating using Arrow Flight 
as data exchange protocol. I am using python to establish a Flight Server. And 
the performance is a little not as my expectation, so I would like to ask some 
suggestion from team. I set up Flight Server on US, and my python client code 
is setup on Asia (e.g: Taiwan).

I find if I want to transfer 178MB data with 1001730 rows from US to Asia. It 
will need 10s. I expect it will less than 1s?
Any parts I am missing?



time python client.py get -c ‘get’





RangeIndex: 1001731 entries, 0 to 1001730

Data columns (total 16 columns):

#   Column             Non-Null Count    Dtype

---  ------             --------------    -----

0   cmte_id            1001731 non-null  object

1   cand_id            1001731 non-null  object

2   cand_nm            1001731 non-null  object

3   contbr_nm          1001731 non-null  object

4   contbr_city        1001712 non-null  object

5   contbr_st          1001727 non-null  object

6   contbr_zip         1001731 non-null  int64

7   contbr_employer    988002 non-null   object

8   contbr_occupation  993301 non-null   object

9   contb_receipt_amt  1001731 non-null  float64

10  contb_receipt_dt   1001731 non-null  object

11  receipt_desc       14166 non-null    object

12  memo_cd            92482 non-null    object

13  memo_text          97770 non-null    object

14  form_tp            1001731 non-null  object

15  file_num           1001731 non-null  int64

dtypes: float64(1), int64(2), object(13)

memory usage: 122.3+ MB





real  0m10.405s

user 0m0.297s

sys   0m0.996s



I will have this expectation is because I look into those articles.

·         https://www.dremio.com/is-time-to-replace-odbc-jdbc

With an average size batch size (256K records), the performance of Flight 
exceeded 20 Gb/s for a single stream running on a single core.

·         https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/

As far as absolute speed, in our C++ data throughput benchmarks, we are seeing 
end-to-end TCP throughput in excess of 2-3GB/s on localhost without TLS 
enabled. This benchmark shows a transfer of ~12 gigabytes of data in about 4 
seconds:









Many Thanks,

Abe




Micron Confidential

Reply via email to