Hi Felix, Just runned your code and it prints Pi is roughly 4.0
Here is the code that I used as you didn't show what a random is I used the nextInt() val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow val count = context.sparkContext.parallelize(1 until n, slices).map { i => val random = new scala.util.Random(1000).nextInt() val x = random * 2 - 1 //(breakpoint-1) val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / (n - 1)) context.sparkContext.stop() Also using the debug it stops into the map (breakpoint 1) before going to print 2016-09-18 6:47 GMT-03:00 chen yong <cy...@hotmail.com>: > Dear Dirceu, > > > Below is our testing codes, as you can see, we have used "reduce" action > to evoke evaluation. However, it still did not stop at breakpoint-1(as > shown in the the code snippet) when debugging. > > > > We are using IDEA version 14.0.3 to debug. It very very strange to us. > Please help us(me and my colleagues). > > > // scalastyle:off println > package org.apache.spark.examples > import scala.math.random > import org.apache.spark._ > import scala.util.logging.Logged > > /** Computes an approximation to pi */ > object SparkPi{ > def main(args: Array[String]) { > > val conf = new SparkConf().setAppName("Spark Pi").setMaster("local") > val spark = new SparkContext(conf) > val slices = if (args.length > 0) args(0).toInt else 2 > val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid > overflow > val count = spark.parallelize(1 until n, slices).map { i => > val x = random * 2 - 1 (breakpoint-1) > val y = random * 2 - 1 > if (x*x + y*y < 1) 1 else 0 > }.reduce(_ + _) > println("Pi is roughly " + 4.0 * count / (n - 1)) > spark.stop() > } > } > > > > > ------------------------------ > *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > *发送时间:* 2016年9月16日 22:27 > *收件人:* chen yong > *抄送:* user@spark.apache.org > *主题:* Re: 答复: it does not stop at breakpoints which is in an anonymous > function > > No, that's not the right way of doing it. > Remember that RDD operations are lazy, due to performance reasons. > Whenever you call one of those operation methods (count, reduce, collect, > ...) they will execute all the functions that you have done to create that > RDD. > It would help if you can post your code here, and also the way that you > are executing it, and trying to debug. > > > 2016-09-16 11:23 GMT-03:00 chen yong <cy...@hotmail.com>: > >> Also, I wonder what is the right way to debug spark program. If I use >> ten anonymous function in one spark program, for debugging each of them, i >> have to place a COUNT action in advace and then remove it after debugging. >> Is that the right way? >> >> >> ------------------------------ >> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >> *发送时间:* 2016年9月16日 21:07 >> *收件人:* chen yong >> *抄送:* user@spark.apache.org >> *主题:* Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in >> an anonymous function >> >> Hello Felix, >> No, this line isn't the one that is triggering the execution of the >> function, the count does that, unless your count val is a lazy val. >> The count method is the one that retrieves the information of the rdd, it >> has do go through all of it's data do determine how many records the RDD >> has. >> >> Regards, >> >> 2016-09-15 22:23 GMT-03:00 chen yong <cy...@hotmail.com>: >> >>> >>> Dear Dirceu, >>> >>> Thanks for your kind help. >>> i cannot see any code line corresponding to "..... retrieve the data >>> from your DataFrame/RDDs....". which you suggested in the previous replies. >>> >>> Later, I guess >>> >>> the line >>> >>> val test = count >>> >>> is the key point. without it, it would not stop at the breakpont-1, >>> right? >>> >>> >>> >>> ------------------------------ >>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>> *发送时间:* 2016年9月16日 0:39 >>> *收件人:* chen yong >>> *抄送:* user@spark.apache.org >>> *主题:* Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an >>> anonymous function >>> >>> Hi Felix, >>> Are sure your n is greater than 0? >>> Here it stops first at breakpoint 1, image attached. >>> Have you got the count to see if it's also greater than 0? >>> >>> 2016-09-15 11:41 GMT-03:00 chen yong <cy...@hotmail.com>: >>> >>>> Dear Dirceu >>>> >>>> >>>> Thank you for your help. >>>> >>>> >>>> Acutally, I use Intellij IDEA to dubug the spark code. >>>> >>>> >>>> Let me use the following code snippet to illustrate my problem. In the >>>> code lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2. >>>> when i debuged the code, it did not stop at breakpoint-1, it seems >>>> that the map >>>> >>>> function was skipped and it directly reached and stoped at the >>>> breakpoint-2. >>>> >>>> Additionally, I find the following two posts >>>> (1)http://stackoverflow.com/questions/29208844/apache-spark- >>>> logging-within-scala >>>> (2)https://www.mail-archive.com/user@spark.apache.org/msg29010.html >>>> >>>> I am wondering whether loggin is an alternative approach to debugging >>>> spark anonymous functions. >>>> >>>> >>>> val count = spark.parallelize(1 to n, slices).map { i => >>>> val x = random * 2 - 1 >>>> val y = random * 2 - 1 (breakpoint-1 set in this line) >>>> if (x*x + y*y < 1) 1 else 0 >>>> }.reduce(_ + _) >>>> val test = x (breakpoint-2 set in this line) >>>> >>>> >>>> >>>> ------------------------------ >>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>>> *发送时间:* 2016年9月14日 23:32 >>>> *收件人:* chen yong >>>> *主题:* Re: 答复: 答复: t it does not stop at breakpoints which is in an >>>> anonymous function >>>> >>>> I don't know which IDE do you use. I use Intellij, and here there is an >>>> Evaluate Expression dialog where I can execute code, whenever it has >>>> stopped in a breakpoint. >>>> In eclipse you have watch and inspect where you can do the same. >>>> Probably you are not seeing the debug stop in your functions because >>>> you never retrieve the data from your DataFrame/RDDs. >>>> What are you doing with this function? Are you getting the result of >>>> this RDD/Dataframe at some place? >>>> You can add a count after the function that you want to debug, just for >>>> debug, but don't forget to remove this after testing. >>>> >>>> >>>> >>>> 2016-09-14 12:20 GMT-03:00 chen yong <cy...@hotmail.com>: >>>> >>>>> Dear Dirceu, >>>>> >>>>> >>>>> thanks you again. >>>>> >>>>> >>>>> Actually,I never saw it stopped at the breakpoints no matter how long >>>>> I wait. It just skipped the whole anonymous function to >>>>> direactly reach the first breakpoint immediately after the anonymous >>>>> function body. Is that normal? I suspect sth wrong in my debugging >>>>> operations or settings. I am very new to spark and scala. >>>>> >>>>> >>>>> Additionally, please give me some detailed instructions about "....Some >>>>> ides provide you a place where you can execute the code to see it's >>>>> results....". where is the PLACE >>>>> >>>>> >>>>> your help badly needed! >>>>> >>>>> >>>>> ------------------------------ >>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>>>> *发送时间:* 2016年9月14日 23:07 >>>>> *收件人:* chen yong >>>>> *主题:* Re: 答复: t it does not stop at breakpoints which is in an >>>>> anonymous function >>>>> >>>>> You can call a count in the ide just to debug, or you can wait until >>>>> it reaches the code, so you can debug. >>>>> Some ides provide you a place where you can execute the code to see >>>>> it's results. >>>>> Be aware of not adding this operations in your production code, >>>>> because they can slow down the execution of your code. >>>>> >>>>> >>>>> >>>>> 2016-09-14 11:43 GMT-03:00 chen yong <cy...@hotmail.com>: >>>>> >>>>>> >>>>>> Thanks for your reply. >>>>>> >>>>>> you mean i have to insert some codes, such as xxxxx.count or >>>>>> xxxxx.collect, between the original spark code lines to invoke some >>>>>> operations, right? >>>>>> but, where is the right places to put my code lines? >>>>>> >>>>>> Felix >>>>>> >>>>>> ------------------------------ >>>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>>>>> *发送时间:* 2016年9月14日 22:33 >>>>>> *收件人:* chen yong >>>>>> *抄送:* user@spark.apache.org >>>>>> *主题:* Re: t it does not stop at breakpoints which is in an anonymous >>>>>> function >>>>>> >>>>>> Hello Felix, >>>>>> Spark functions run lazy, and that's why it doesn't stop in those >>>>>> breakpoints. >>>>>> They will be executed only when you call some methods of your >>>>>> dataframe/rdd, like the count, collect, ... >>>>>> >>>>>> Regards, >>>>>> Dirceu >>>>>> >>>>>> 2016-09-14 11:26 GMT-03:00 chen yong <cy...@hotmail.com>: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> >>>>>>> >>>>>>> I am newbie to spark. I am learning spark by debugging the >>>>>>> spark code. It is strange to me that it does not stop at breakpoints >>>>>>> which >>>>>>> is in an anonymous function, it is normal in ordianry function, >>>>>>> though. It that normal. How to obverse variables in an anonymous >>>>>>> function. >>>>>>> >>>>>>> >>>>>>> Please help me. Thanks in advance! >>>>>>> >>>>>>> >>>>>>> Felix >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >