I also started the streaming context by running ssc.start() but still apart
from logs nothing of g gets printed.

---------- Forwarded message ----------
From: Animesh Baranawal <animeshbarana...@gmail.com>
Date: Thu, May 28, 2015 at 6:57 PM
Subject: SPARK STREAMING PROBLEM
To: user@spark.apache.org


Hi,

I am trying to extract the filenames from which a Dstream is generated by
parsing the toDebugString method on RDD
I am implementing the following code in spark-shell:

import org.apache.spark.streaming.{StreamingContext, Seconds}
val ssc = new StreamingContext(sc,Seconds(10))
val lines = ssc.textFileStream(// directory //)

def g : List[String] = {
   var res = List[String]()
   lines.foreachRDD{ rdd => {
      if(rdd.count > 0){
      val files = rdd.toDebugString.split("\n").filter(_.contains(":\"))
      files.foreach{ ms => {
         res = ms.split(" ")(2)::res
      }}   }
   }}
   res
}

g.foreach(x => {println(x); println("************")})

However when I run the code, nothing gets printed on the console apart from
the logs. Am I doing something wrong?
And is there any better way to extract the file names from DStream ?

Thanks in advance


Animesh

Reply via email to