[ https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-6776. --------------------------------- Fix Version/s: 1.0.0 Assignee: Wes McKinney Resolution: Fixed Yes indeed. I'm closing this as resolved. > [Python] Need a lite version of pyarrow > --------------------------------------- > > Key: ARROW-6776 > URL: https://issues.apache.org/jira/browse/ARROW-6776 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Affects Versions: 0.14.1 > Reporter: Haowei Yu > Assignee: Wes McKinney > Priority: Major > Fix For: 1.0.0 > > > Currently I am building a library packages on top of pyarrow, so I include > pyarrow as a dependency and ship it to our customer. However, when our > customer installed our packages, it will also install pyarrow and pyarrow's > dependency (numpy). However the dependency size is huge. > {code:bash} > (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M > /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ > total 186M > {code} > And numpy is around 80MB. Total is more than 250 MB. > Our customer want to bundle all dependency and run the code inside AWS > Lambda, however they hit the size limit and failed to run the code. > Looking into the pyarrow, I saw multiple .so files are shipped both with and > without version suffix, I wonder if you can remove the one of them (either > with or without suffix), it will at least reduce the package size by half. > Further, our library just want to use IPC and read data as record batch, I > don't need arrow flight at all (which is the biggest .so file and takes > around 100 MB). I wonder if you can push a lite version of the pyarrow so > that I can specify lite version as the dependency. Or maybe I need to build > my own lite version and push it pypi. However, this approach cause further > problem if our customer is using the "fat" version of pyarrow unless you the > change the namespace of lite version of pyarrow. > Another alternative is that I bundle the pyarrow with our library ( copy the > whole directory into vendored namespace) and ship it to our customer without > specifying pyarrow as a dependency. The advantage of this one is that I can > build pyarrow with whatever option/sub-module/libraries I need. However, I > tried a lot but failed because pyarrow use absolute import and it will fail > to import the script in the new location. > Any insight how I should resolve this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005)