I don't think the performance numbers accounted for high latency links.
What bandwidth link do you have between the two servers?  YOu might try
using compression in Arrow.

On Thu, Aug 19, 2021 at 10:40 AM Abe Hsu 許 育銘 (abehsu) <[email protected]>
wrote:

> Micron Confidential
>
>
> Hi team:
>
> I am Abe from Taiwan. This is my first time sent mail to apache community,
> if i do something wrong, please correct me.  I am investigating using Arrow
> Flight as data exchange protocol. I am using python to establish a Flight
> Server. And the performance is a little not as my expectation, so I would
> like to ask some suggestion from team. I set up Flight Server on US, and my
> python client code is setup on Asia (e.g: Taiwan).
>
> I find if I want to transfer 178MB data with 1001730 rows from US to Asia.
> It will need 10s. I expect it will less than 1s?
> Any parts I am missing?
>
>
>
> time python client.py get -c ‘get’
>
>
>
>
>
> RangeIndex: 1001731 entries, 0 to 1001730
>
> Data columns (total 16 columns):
>
> #   Column             Non-Null Count    Dtype
>
> ---  ------             --------------    -----
>
> 0   cmte_id            1001731 non-null  object
>
> 1   cand_id            1001731 non-null  object
>
> 2   cand_nm            1001731 non-null  object
>
> 3   contbr_nm          1001731 non-null  object
>
> 4   contbr_city        1001712 non-null  object
>
> 5   contbr_st          1001727 non-null  object
>
> 6   contbr_zip         1001731 non-null  int64
>
> 7   contbr_employer    988002 non-null   object
>
> 8   contbr_occupation  993301 non-null   object
>
> 9   contb_receipt_amt  1001731 non-null  float64
>
> 10  contb_receipt_dt   1001731 non-null  object
>
> 11  receipt_desc       14166 non-null    object
>
> 12  memo_cd            92482 non-null    object
>
> 13  memo_text          97770 non-null    object
>
> 14  form_tp            1001731 non-null  object
>
> 15  file_num           1001731 non-null  int64
>
> dtypes: float64(1), int64(2), object(13)
>
> memory usage: 122.3+ MB
>
>
>
>
>
> real  0m10.405s
>
> user 0m0.297s
>
> sys   0m0.996s
>
>
>
> I will have this expectation is because I look into those articles.
>
> ·         https://www.dremio.com/is-time-to-replace-odbc-jdbc
>
> With an average size batch size (256K records), the performance of Flight
> exceeded 20 Gb/s for a single stream running on a single core.
>
> ·
> https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
>
> As far as absolute speed, in our C++ data throughput benchmarks, we are
> seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost without
> TLS enabled. This benchmark shows a transfer of ~12 gigabytes of data in
> about 4 seconds:
>
>
>
>
>
>
>
>
>
> Many Thanks,
>
> Abe
>
>
>
> Micron Confidential
>

Reply via email to