You need to factor your program so that it’s not just a main(). This is not a 
Spark-specific issue, it’s about how you’d unit test any program in general. In 
this case, your main() creates a SparkContext, so you can’t pass one from 
outside, and your code has to read data from a file and write it to a file. It 
would be better to move your code for transforming data into a new function:

def processData(lines: RDD[String]): RDD[String] = {
  // build and return your “res” variable
}

Then you can unit-test this directly on data you create in your program:

val myLines = sc.parallelize(Seq(“line 1”, “line 2”))
val result = GetInfo.processData(myLines).collect()
assert(result.toSet === Set(“res 1”, “res 2”))

Matei

On Jun 13, 2014, at 2:42 PM, SK <skrishna...@gmail.com> wrote:

> Hi,
> 
> I have looked through some of the  test examples and also the brief
> documentation on unit testing at
> http://spark.apache.org/docs/latest/programming-guide.html#unit-testing, but
> still dont have a good understanding of writing unit tests using the Spark
> framework. Previously, I have written unit tests using specs2 framework and
> have got them to work in Scalding.  I tried to use the specs2 framework with
> Spark, but could not find any simple examples I could follow. I am open to
> specs2 or Funsuite, whichever works best with Spark. I would like some
> additional guidance, or some simple sample code using specs2 or Funsuite. My
> code is provided below.
> 
> 
> I have the following code in src/main/scala/GetInfo.scala. It reads a Json
> file and extracts some data. It takes the input file (args(0)) and output
> file (args(1)) as arguments.
> 
> object GetInfo{
> 
>   def main(args: Array[String]) {
>         val inp_file = args(0)
>         val conf = new SparkConf().setAppName("GetInfo")
>         val sc = new SparkContext(conf)
>         val res = sc.textFile(log_file)
>                   .map(line => { parse(line) })
>                   .map(json =>
>                      {
>                         implicit lazy val formats =
> org.json4s.DefaultFormats
>                         val aid = (json \ "d" \ "TypeID").extract[Int]
>                         val ts = (json \ "d" \ "TimeStamp").extract[Long]
>                         val gid = (json \ "d" \ "ID").extract[String]
>                         (aid, ts, gid)
>                      }
>                    )
>                   .groupBy(tup => tup._3)
>                   .sortByKey(true)
>                   .map(g => (g._1, g._2.map(_._2).max))
>         res.map(tuple=> "%s, %d".format(tuple._1,
> tuple._2)).saveAsTextFile(args(1))
> }
> 
> 
> I would like to test the above code. My unit test is in src/test/scala. The
> code I have so far for the unit test appears below:
> 
> import org.apache.spark._
> import org.specs2.mutable._
> 
> class GetInfoTest extends Specification with java.io.Serializable{
> 
>     val data = List (
>      ("d": {"TypeID" = 10, "Timestamp": 1234, "ID": "ID1"}),
>      ("d": {"TypeID" = 11, "Timestamp": 5678, "ID": "ID1"}),
>      ("d": {"TypeID" = 10, "Timestamp": 1357, "ID": "ID2"}),
>      ("d": {"TypeID" = 11, "Timestamp": 2468, "ID": "ID2"})
>    )
> 
>     val expected_out = List(
>        ("ID1",5678),
>        ("ID2",2468),
>     )
> 
>    "A GetInfo job" should {
>             //***** How do I pass "data" define above as input and output
> which GetInfo expects as arguments? ******
>             val sc = new SparkContext("local", "GetInfo")
> 
>             //*** how do I get the output ***
> 
>              //assuming out_buffer has the output I want to match it to the
> expected output
>             "match expected output" in {
>                      ( out_buffer == expected_out) must beTrue
>             }
>     }
> 
> }
> 
> I would like some help with the tasks marked with "****" in the unit test
> code above. If specs2 is not the right way to go, I am also open to
> FunSuite. I would like to know how to pass the input while calling my
> program from the unit test and get the output.
> 
> Thanks for your help.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/guidance-on-simple-unit-testing-with-Spark-tp7604.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to