Hi! I plan to: - join - group by - filter data using pyarrow (new to it). The idea is to get better performance and memory utilisation ( apache arrow columnar compression) compared to pandas. Seems like pyarrow has no support for joining two Tables / Dataset by key so I have to fallback to pandas. I don’t really follow how pyarrow <-> pandas integration works. Will pandas rely on apache arrow data structure? I’m fine with using only these flat types for columns to avoid "corner cases" - string - int - long - decimal
I have a feeling that pandas will copy all data from apache arrow and double the size (according to the doc). Did I get it right? What is the right way to join, groupBy and filter several "Tables" / "Datasets" utilizing pyarrow (underlying apache arrow) power? Thank you!
