When running spark from spark-shell, when each defined variable created the shell prints out the type signature of that variable along with the toString of the instance.
how can i programmatically generated the same signature without using the shell (for debugging purposes) from a spark script or class? example code run in spark shell (see bold output below) ------------------------------------------------------------------------------------ code: ------------------------------------------------------------------------------------ val data = Array("one", "two", "three", "two", "three", "three") val dataRdd = sc.parallelize(data) val dataTupleRdd = dataRdd.map(word => (word, 1)) val countsRdd = dataTupleRdd.reduceByKey(_ + _) countsRdd.foreach(println) ------------------------------------------------------------------------------------ code run in spark shell (see bold output below: i want to generate that from the api) ------------------------------------------------------------------------------------ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) Type in expressions to have them evaluated. Type :help for more information. scala> val data = Array("one", "two", "three", "two", "three", "three") *data: Array[String]* = Array(one, two, three, two, three, three) scala> val dataRdd = sc.parallelize(data) *dataRdd: org.apache.spark.rdd.RDD[String] *= ParallelCollectionRDD[0] at parallelize at <console>:26 scala> val dataTupleRdd = dataRdd.map(word => (word, 1)) *dataTupleRdd: org.apache.spark.rdd.RDD[(String, Int)] *= MapPartitionsRDD[1] at map at <console>:28 scala> val countsRdd = dataTupleRdd.reduceByKey(_ + _) *countsRdd: org.apache.spark.rdd.RDD[(String, Int)]* = ShuffledRDD[2] at reduceByKey at <console>:30 scala> countsRdd.foreach(println) (two,2) (one,1) (three,3)