'nested' RDD problem, advise needed

Michael Lewis Sat, 21 Mar 2015 10:34:11 -0700

Hi,

I wonder if someone can help suggest a solution to my problem, I had a simple 
process working using Strings and now
want to convert to RDD[Char], the problem is when I end up with a nested call 
as follow:



1) Load a text file into an RDD[Char]

        val inputRDD = sc.textFile(“myFile.txt”).flatMap(_.toIterator)


2) I have a method that takes two parameters:

        object Foo
        {
                def myFunction(inputRDD: RDD[Char], int val) : RDD[Char] ...


3) I have a method that the driver process calls once its loaded the inputRDD 
‘bar’ as follows:

def bar(inputRDD: Rdd[Char) : Int = {

         val solutionSet = sc.parallelize(1 to alphabetLength toList).map(shift 
=> (shift, Object.myFunction(inputRDD,shift)))



What I’m trying to do is take a list 1..26 and generate a set of tuples { 
(1,RDD(1)), …. (26,RDD(26)) }  which is the inputRDD passed through
the function above, but with different set of shift parameters.

In my original I could parallelise the algorithm fine, but my input string had 
to be in a ‘String’ variable, I’d rather it be an RDD 
(string could be large). I think the way I’m trying to do it above won’t work 
because its a nested RDD call. 

Can anybody suggest a solution?

Regards,
Mike Lewis





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

'nested' RDD problem, advise needed

Reply via email to