I think these two ways are both OK for you to write streaming job, `transform` 
is a more general way for you to transform from one DStream to another if 
there’s no related DStream API (but have related RDD API). But using map maybe 
more straightforward and easy to understand.

Thanks
Jerry

From: madhu phatak [mailto:phatak....@gmail.com]
Sent: Monday, March 16, 2015 4:32 PM
To: user@spark.apache.org
Subject: MappedStream vs Transform API

Hi,
  Current implementation of map function in spark streaming looks as below.

  def map[U: ClassTag](mapFunc: T => U): DStream[U] = {

  new MappedDStream(this, context.sparkContext.clean(mapFunc))
}
It creates an instance of MappedDStream which is a subclass of DStream.

The same function can be also implemented using transform API


def map[U: ClassTag](mapFunc: T => U): DStream[U] =

this.transform(rdd => {

  rdd.map(mapFunc)
})

Both implementation looks same. If they are same, is there any advantage having 
a subclass of DStream?. Why can't we just use transform API?


Regards,
Madhukara Phatak
http://datamantra.io/

Reply via email to