I am processing a log file, from each line of which I want to extract the zeroth and 4th elements (and an integer 1 for counting) into a tuple. I had hoped to be able to index the Array for elements 0 and 4, but Arrays appear not to support vector indexing. I'm not finding a way to extract and combine the elements properly, perhaps due to being a SparkStreaming/Scala newbie.
My code so far looks like: 1] var lines = ssc.textFileStream(dirArg) 2] var linesArray = lines.map( line => (line.split("\t"))) 3] var respH = linesArray.map( lineArray => lineArray(4) ) 4a] var time = linesArray.map( lineArray => lineArray(0) ) 4b] var time = linesArray.map( lineArray => (lineArray(0), 1)) 5] var newState = respH.union(time) If I use line 4a and not 4b, it compiles properly. (I still have issues getting my update function to updateStateByKey working, so don't know if it _works_ properly.) If I use line 4b and not 4a, it fails at compile time with [error] foo.scala:82: type mismatch; [error] found : org.apache.spark.streaming.dstream.DStream[(String, Int)] [error] required: org.apache.spark.streaming.dstream.DStream[String] [error] var newState = respH.union(time) This implies that the DStreams being union()ed have to be of identical per-element type. Can anyone confirm that's true? If so, is there a way to extract the needed elements and build the new DStream? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-extract-combine-elements-of-an-Array-in-DStream-element-tp17676.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org