Try this out:

JavaStreamingContext sc = new
JavaStreamingContext(...);JavaDStream<String> lines =
ctx.fileStream("whatever");JavaDStream<String> words = lines.flatMap(
  new FlatMapFunction<String, String>() {
    public Iterable<String> call(String s) {
      return Arrays.asList(s.split(" "));
    }
  });

JavaPairDStream<String, Integer> ones = words.map(
  new PairFunction<String, String, Integer>() {
    public Tuple2<String, Integer> call(String s) {
      return new Tuple2(s, 1);
    }
  });

JavaPairDStream<String, Integer> counts = ones.reduceByKey(
  new Function2<Integer, Integer, Integer>() {
    public Integer call(Integer i1, Integer i2) {
      return i1 + i2;
    }
  });


​Actually modified from
https://spark.apache.org/docs/0.9.1/java-programming-guide.html#example​

Thanks
Best Regards


On Wed, Jul 9, 2014 at 6:03 AM, Aravind <aravindb...@gmail.com> wrote:

> Hi all,
>
> I am trying to run the NetworkWordCount.java file in the streaming
> examples.
> The example shows how to read from a network socket. But my usecase is that
> , I have a local log file which is a stream and continuously updated (say
> /Users/.../Desktop/mylog.log).
>
> I would like to write the same NetworkWordCount.java using this filestream
>
> jssc.fileStream(dataDirectory);
>
> Question:
> 1. How do I write a mapreduce function for the above to measure wordcounts
> (in java, not scala)?
>
> 2. Also does the streaming application stop if the file is not updating or
> does it continuously poll for the file updates?
>
> I am a new user of Apache Spark Streaming. Kindly help me as I am totally
> stuck....
>
> Thanks in advance.
>
> Regards
> Aravind
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-using-File-Stream-in-Java-tp9115.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to