Re: RDD (Spark dataframe) into a PCollection?

Alexey Romanenko Mon, 23 May 2022 11:43:24 -0700

To add a bit more to what Robert suggested. Right, in general we can’t read 
Spark RDD directly with Beam (Spark runner uses RDD under the hood but it’s a 
different story) but you can write the results to any storage and in data 
format that Beam supports and then read it with a corespondent Beam IO 
connector.


—
Alexey

> On 23 May 2022, at 20:35, Robert Bradshaw <rober...@google.com> wrote:
> 
> The easiest way to do this would be to write the RDD somewhere then
> read it from Beam.
> 
> On Mon, May 23, 2022 at 9:39 AM Yushu Yao <yao.yu...@gmail.com> wrote:
>> 
>> Hi Folks,
>> 
>> I know this is not the optimal way to use beam :-) But assume I only use the 
>> spark runner.
>> 
>> I have a spark library (very complex) that emits a spark dataframe (or RDD).
>> I also have an existing complex beam pipeline that can do post processing on 
>> the data inside the dataframe.
>> 
>> However, the beam part needs a pcollection to start with. The question is, 
>> how can I convert a spark RDD into a pcollection?
>> 
>> Thanks
>> -Yushu
>>

Re: RDD (Spark dataframe) into a PCollection?

Reply via email to