I think accumulators do exactly what you want. (Scala syntax below, I'm just not familiar with the Java equivalent ...)
val f1counts = sc.accumulator (0) val f2counts = sc.accumulator (0) val f3counts = sc.accumulator (0) textfile.foreach { s => if(f1matches) f1counts += 1 ... } Note that you could also do a normal map reduce even though a record might match more than one filter. In the scala api you can use flatmap to output zero or more records: textfile.flatmap { s => Seq ( (if (f1matches) Some ("f1" -> 1) else None), ... ).flatten }.reduceByKey { _ + _ } On Dec 16, 2014 2:07 AM, "zkidkid" <zkid...@gmail.com> wrote: > Hi, > Currently I am trying to count on a document with multiple filter. > Let say, here is my document: > > //user field1 field2 field3 > user1 0 0 1 > user2 0 1 0 > user3 0 0 0 > > I want to count on user.log for some filters like this: > > Filter1: field1 == 0 & field 2 = 0 > Filter2: field1 == 0 & field 3 = 1 > Filter3: field1 == 0 & field 3 = 0 > ... > and total line. > > I have tried and I found that I couldn't use "group by" or "map then > reduce" > because a line could match two or more filter. > > My idea now is "foreach" line and then maintain a outsite counter service. > > Forexample: > > JavaRDD<String> textFile = sc.textFile(hdfs, 10); > long start = System.currentTimeMillis(); > > textFile.foreach(new VoidFunction<String>() { > > public void call(String s) { > foreach(MyFilter filter: MyFilters){ > if(filter.match(s)) filter.increaseOwnCounter(); > } > } > }); > > > I would happy if there have another way to do it, any help is appreciate. > Thanks in advance. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-Filter-Effiency-tp20701.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >