Hi Artem, I don't believe this is currently possible, but it could be a great addition to PySpark since this would offer a convenient and efficient way to parallelize nested column data. I created the JIRA https://issues.apache.org/jira/browse/SPARK-29040 for this.
On Tue, Aug 27, 2019 at 7:55 PM Artem Kozhevnikov < kozhevnikov.ar...@gmail.com> wrote: > I wonder if there's some recommended method to convert in memory > pyarrow.Table (or pyarrow.BatchRecord) to pyspark.Dataframe without using > pandas ? > My motivation is about converting nested data (like List[int]) that have > an efficient representation in pyarrow which is not possible with Pandas (I > don't want to pass by python list of int ...). > > Thanks in advance ! > Artem > > >