RE: MappedStream vs Transform API

Shao, Saisai Mon, 16 Mar 2015 01:46:35 -0700

I think these two ways are both OK for you to write streaming job, `transform` 
is a more general way for you to transform from one DStream to another if 
there’s no related DStream API (but have related RDD API). But using map maybe 
more straightforward and easy to understand.

Thanks
Jerry

From: madhu phatak [mailto:phatak....@gmail.com]
Sent: Monday, March 16, 2015 4:32 PM
To: user@spark.apache.org
Subject: MappedStream vs Transform API

Hi,
  Current implementation of map function in spark streaming looks as below.

  def map[U: ClassTag](mapFunc: T => U): DStream[U] = {

  new MappedDStream(this, context.sparkContext.clean(mapFunc))
}
It creates an instance of MappedDStream which is a subclass of DStream.

The same function can be also implemented using transform API

def map[U: ClassTag](mapFunc: T => U): DStream[U] =

this.transform(rdd => {

  rdd.map(mapFunc)
})

Both implementation looks same. If they are same, is there any advantage having 
a subclass of DStream?. Why can't we just use transform API?

Regards,
Madhukara Phatak
http://datamantra.io/

RE: MappedStream vs Transform API

Reply via email to