To add a bit more to what Robert suggested. Right, in general we can’t read Spark RDD directly with Beam (Spark runner uses RDD under the hood but it’s a different story) but you can write the results to any storage and in data format that Beam supports and then read it with a corespondent Beam IO connector.
— Alexey > On 23 May 2022, at 20:35, Robert Bradshaw <rober...@google.com> wrote: > > The easiest way to do this would be to write the RDD somewhere then > read it from Beam. > > On Mon, May 23, 2022 at 9:39 AM Yushu Yao <yao.yu...@gmail.com> wrote: >> >> Hi Folks, >> >> I know this is not the optimal way to use beam :-) But assume I only use the >> spark runner. >> >> I have a spark library (very complex) that emits a spark dataframe (or RDD). >> I also have an existing complex beam pipeline that can do post processing on >> the data inside the dataframe. >> >> However, the beam part needs a pcollection to start with. The question is, >> how can I convert a spark RDD into a pcollection? >> >> Thanks >> -Yushu >>