Re: Processing Flexibility Between RDD and Dataframe API

Adrienne Kole Sun, 28 Oct 2018 08:06:56 -0700

Thanks for bringing this issue to the mailing list.
As an addition, I would also ask the same questions about  DStreams and
Structured Streaming APIs.
Structured Streaming is high level and it makes difficult to express all
business logic in it, although Databricks are pushing it and recommending
for usage.
Moreover, there are some works are going on continuous streaming.
So, what is the Spark's future vision, support all or concentrate on one,
as all those paradigms have separate processing semantics?



Cheers,
Adrienne

On Sun, Oct 28, 2018 at 3:50 PM Soheil Pourbafrani <soheil.i...@gmail.com>
wrote:

> Hi,
> There are some functions like map, flatMap, reduce and ..., that construct
> the base data processing operation in big data (and Apache Spark). But
> Spark, in new versions, introduces the high-level Dataframe API and
> recommend using it. This is while there are no such functions in Dataframe
> API and it just has many built-in functions and the UDF. It's very
> inflexible (at least to me) and I at many points should convert
> Dataframes to RDD and vice-versa. My question is:
> Is RDD going to be outdated and if so, what is the correct road-map to do
> processing using Apache Spark, while Dataframe doesn't support functions
> like Map and reduce? How UDF functions process the data, they will apply to
> every row, like map functions? Are converting Dataframe to RDD comes with
> many costs?
>

Re: Processing Flexibility Between RDD and Dataframe API

Reply via email to