Re: Working with many RDDs in parallel?

2014-08-18 Thread David Tinker
r. > 1000 RDD count()s at once isn't a good idea for example. > > It may be the case that you don't really need a bunch of RDDs at all, > but can operate on an RDD of pairs of Strings (roots) and > something-elses, all at once. > > > On Mon, Aug 18, 2014 at 2:31 P

Working with many RDDs in parallel?

2014-08-18 Thread David Tinker
Hi All. I need to create a lot of RDDs starting from a set of "roots" and count the rows in each. Something like this: final JavaSparkContext sc = new JavaSparkContext(conf); List roots = ... Map res = sc.parallelize(roots).mapToPair(new PairFunction(){ public Tuple2 call(String root) throws