Try this out: JavaStreamingContext sc = new JavaStreamingContext(...);JavaDStream<String> lines = ctx.fileStream("whatever");JavaDStream<String> words = lines.flatMap( new FlatMapFunction<String, String>() { public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); } });
JavaPairDStream<String, Integer> ones = words.map( new PairFunction<String, String, Integer>() { public Tuple2<String, Integer> call(String s) { return new Tuple2(s, 1); } }); JavaPairDStream<String, Integer> counts = ones.reduceByKey( new Function2<Integer, Integer, Integer>() { public Integer call(Integer i1, Integer i2) { return i1 + i2; } }); Actually modified from https://spark.apache.org/docs/0.9.1/java-programming-guide.html#example Thanks Best Regards On Wed, Jul 9, 2014 at 6:03 AM, Aravind <aravindb...@gmail.com> wrote: > Hi all, > > I am trying to run the NetworkWordCount.java file in the streaming > examples. > The example shows how to read from a network socket. But my usecase is that > , I have a local log file which is a stream and continuously updated (say > /Users/.../Desktop/mylog.log). > > I would like to write the same NetworkWordCount.java using this filestream > > jssc.fileStream(dataDirectory); > > Question: > 1. How do I write a mapreduce function for the above to measure wordcounts > (in java, not scala)? > > 2. Also does the streaming application stop if the file is not updating or > does it continuously poll for the file updates? > > I am a new user of Apache Spark Streaming. Kindly help me as I am totally > stuck.... > > Thanks in advance. > > Regards > Aravind > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-using-File-Stream-in-Java-tp9115.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >