I think these two ways are both OK for you to write streaming job, `transform` is a more general way for you to transform from one DStream to another if there’s no related DStream API (but have related RDD API). But using map maybe more straightforward and easy to understand.
Thanks Jerry From: madhu phatak [mailto:phatak....@gmail.com] Sent: Monday, March 16, 2015 4:32 PM To: user@spark.apache.org Subject: MappedStream vs Transform API Hi, Current implementation of map function in spark streaming looks as below. def map[U: ClassTag](mapFunc: T => U): DStream[U] = { new MappedDStream(this, context.sparkContext.clean(mapFunc)) } It creates an instance of MappedDStream which is a subclass of DStream. The same function can be also implemented using transform API def map[U: ClassTag](mapFunc: T => U): DStream[U] = this.transform(rdd => { rdd.map(mapFunc) }) Both implementation looks same. If they are same, is there any advantage having a subclass of DStream?. Why can't we just use transform API? Regards, Madhukara Phatak http://datamantra.io/