Hi All.
I need to create a lot of RDDs starting from a set of "roots" and count the
rows in each. Something like this:
final JavaSparkContext sc = new JavaSparkContext(conf);
List<String> roots = ...
Map<String, Object> res = sc.parallelize(roots).mapToPair(new
PairFunction<String, String, Long>(){
public Tuple2<String, Long> call(String root) throws Exception {
... create RDD based on root from sc somehow ...
return new Tuple2<String, Long>(root, rdd.count())
}
}).countByKey()
This fails with a message about JavaSparkContext not being serializable.
Is there a way to get at the content inside of the map function or should I
be doing something else entirely?
Thanks
David