[ 
https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147676#comment-17147676
 ] 

Antoine Pitrou edited comment on ARROW-6776 at 6/29/20, 10:47 AM:
------------------------------------------------------------------

The latest PyArrow wheels are much lighter:
{code}
$ du -hs venv-3.7/lib/python3.7/site-packages/pyarrow/
57M     venv-3.7/lib/python3.7/site-packages/pyarrow/
{code}

PS: see here for nightly PyArrow wheels:
https://arrow.apache.org/docs/python/install.html#installing-nightly-packages



was (Author: pitrou):
The latest PyArrow wheels (*) are much lighter:
{code}
$ du -hs venv-3.7/lib/python3.7/site-packages/pyarrow/
57M     venv-3.7/lib/python3.7/site-packages/pyarrow/
{code}

PS: see here for nightly PyArrow wheels:
https://arrow.apache.org/docs/python/install.html#installing-nightly-packages


> [Python] Need a lite version of pyarrow
> ---------------------------------------
>
>                 Key: ARROW-6776
>                 URL: https://issues.apache.org/jira/browse/ARROW-6776
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.14.1
>            Reporter: Haowei Yu
>            Priority: Major
>
> Currently I am building a library packages on top of pyarrow, so I include 
> pyarrow as a dependency and ship it to our customer. However, when our 
> customer installed our packages, it will also install pyarrow and pyarrow's 
> dependency (numpy). However the dependency size is huge. 
> {code:bash}
> (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
> /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
> total 186M
> {code}
>  And numpy is around 80MB. Total is more than 250 MB.
> Our customer want to bundle all dependency and run the code inside AWS 
> Lambda, however they hit the size limit and failed to run the code.
> Looking into the pyarrow, I saw multiple .so files are shipped both with and 
> without version suffix, I wonder if you can remove the one of them (either 
> with or without suffix), it will at least reduce the package size by half.
> Further, our library just want to use IPC and read data as record batch, I 
> don't need arrow flight at all (which is the biggest .so file and takes 
> around 100 MB). I wonder if you can push a lite version of the pyarrow so 
> that I can specify lite version as the dependency. Or maybe I need to build 
> my own lite version and push it pypi. However, this approach cause further 
> problem if our customer is using the "fat" version of pyarrow unless you the 
> change the namespace of lite version of pyarrow.
> Another alternative is that I bundle the pyarrow with our library ( copy the 
> whole directory into vendored namespace) and ship it to our customer without 
> specifying pyarrow as a dependency. The advantage of this one is that I can 
> build pyarrow with whatever option/sub-module/libraries I need. However, I 
> tried a lot but failed because pyarrow use absolute import and it will fail 
> to import the script in the new location. 
> Any insight how I should resolve this issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to