Re: 答复: 答复: it does not stop at breakpoints which is in an anonymous function

Dirceu Semighini Filho Fri, 23 Sep 2016 12:19:30 -0700

Hi Felix,
Just runned your code and it prints

Pi is roughly 4.0


Here is the code that I used as you didn't show what a random is I used the
nextInt()

 val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    val count = context.sparkContext.parallelize(1 until n, slices).map { i
=>
      val random = new scala.util.Random(1000).nextInt()
      val x = random * 2 - 1  //(breakpoint-1)
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / (n - 1))
    context.sparkContext.stop()

Also using the debug it stops into the map (breakpoint 1) before going to
print

2016-09-18 6:47 GMT-03:00 chen yong <cy...@hotmail.com>:

> Dear Dirceu,
>
>
> Below is  our testing codes, as you can see, we have used "reduce" action
> to evoke evaluation. However, it still did not stop at breakpoint-1(as
> shown in the the code snippet) when debugging.
>
>
>
> We are using IDEA  version 14.0.3 to debug.  It very very strange to us.
> Please help us(me and my colleagues).
>
>
> // scalastyle:off println
> package org.apache.spark.examples
> import scala.math.random
> import org.apache.spark._
> import scala.util.logging.Logged
>
> /** Computes an approximation to pi */
> object SparkPi{
>   def main(args: Array[String]) {
>
>     val conf = new SparkConf().setAppName("Spark Pi").setMaster("local")
>     val spark = new SparkContext(conf)
>     val slices = if (args.length > 0) args(0).toInt else 2
>     val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid
> overflow
>     val count = spark.parallelize(1 until n, slices).map { i =>
>         val x = random * 2 - 1  (breakpoint-1)
>         val y = random * 2 - 1
>         if (x*x + y*y < 1) 1 else 0
>       }.reduce(_ + _)
>     println("Pi is roughly " + 4.0 * count / (n - 1))
>     spark.stop()
>   }
> }
>
>
>
>
> ------------------------------
> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
> *发送时间:* 2016年9月16日 22:27
> *收件人:* chen yong
> *抄送:* user@spark.apache.org
> *主题:* Re: 答复: it does not stop at breakpoints which is in an anonymous
> function
>
> No, that's not the right way of doing it.
> Remember that RDD operations are lazy, due to performance reasons.
> Whenever you call one of those operation methods (count, reduce, collect,
> ...) they will execute all the functions that you have done to create that
> RDD.
> It would help if you can post your code here, and also the way that you
> are executing it, and trying to debug.
>
>
> 2016-09-16 11:23 GMT-03:00 chen yong <cy...@hotmail.com>:
>
>> Also, I wonder what is the right way to debug  spark program. If I use
>> ten anonymous function in one spark program, for debugging each of them, i
>> have to place a COUNT action in advace and then remove it after debugging.
>> Is that the right way?
>>
>>
>> ------------------------------
>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>> *发送时间:* 2016年9月16日 21:07
>> *收件人:* chen yong
>> *抄送:* user@spark.apache.org
>> *主题:* Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in
>> an anonymous function
>>
>> Hello Felix,
>> No, this line isn't the one that is triggering the execution of the
>> function, the count does that, unless your count val is a lazy val.
>> The count method is the one that retrieves the information of the rdd, it
>> has do go through all of it's data do determine how many records the RDD
>> has.
>>
>> Regards,
>>
>> 2016-09-15 22:23 GMT-03:00 chen yong <cy...@hotmail.com>:
>>
>>>
>>> Dear Dirceu,
>>>
>>> Thanks for your kind help.
>>> i cannot see any code line corresponding to "..... retrieve the data
>>> from your DataFrame/RDDs....". which you suggested in the previous replies.
>>>
>>> Later, I guess
>>>
>>> the line
>>>
>>> val test = count
>>>
>>> is the key point. without it, it would not stop at the breakpont-1,
>>> right?
>>>
>>>
>>>
>>> ------------------------------
>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>> *发送时间:* 2016年9月16日 0:39
>>> *收件人:* chen yong
>>> *抄送:* user@spark.apache.org
>>> *主题:* Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an
>>> anonymous function
>>>
>>> Hi Felix,
>>> Are sure your n is greater than 0?
>>> Here it stops first at breakpoint 1, image attached.
>>> Have you got the count to see if it's also greater than 0?
>>>
>>> 2016-09-15 11:41 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>
>>>> Dear Dirceu
>>>>
>>>>
>>>> Thank you for your help.
>>>>
>>>>
>>>> Acutally, I use Intellij IDEA to dubug the spark code.
>>>>
>>>>
>>>> Let me use the following code snippet to illustrate my problem. In the
>>>> code lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2.
>>>> when i debuged the code, it did not stop at breakpoint-1, it seems
>>>> that the map
>>>>
>>>> function was skipped and it directly reached and stoped at the
>>>> breakpoint-2.
>>>>
>>>> Additionally, I find the following two posts
>>>> (1)http://stackoverflow.com/questions/29208844/apache-spark-
>>>> logging-within-scala
>>>> (2)https://www.mail-archive.com/user@spark.apache.org/msg29010.html
>>>>
>>>> I am wondering whether loggin is an alternative approach to debugging
>>>> spark anonymous functions.
>>>>
>>>>
>>>> val count = spark.parallelize(1 to n, slices).map { i =>
>>>>       val x = random * 2 - 1
>>>>       val y = random * 2 - 1 (breakpoint-1 set in this line)
>>>>       if (x*x + y*y < 1) 1 else 0
>>>>     }.reduce(_ + _)
>>>> val test = x (breakpoint-2 set in this line)
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>>> *发送时间:* 2016年9月14日 23:32
>>>> *收件人:* chen yong
>>>> *主题:* Re: 答复: 答复: t it does not stop at breakpoints which is in an
>>>> anonymous function
>>>>
>>>> I don't know which IDE do you use. I use Intellij, and here there is an
>>>> Evaluate Expression dialog where I can execute code, whenever it has
>>>> stopped in a breakpoint.
>>>> In eclipse you have watch and inspect where you can do the same.
>>>> Probably you are not seeing the debug stop in your functions because
>>>> you never retrieve the data from your DataFrame/RDDs.
>>>> What are you doing with this function? Are you getting the result of
>>>> this RDD/Dataframe at some place?
>>>> You can add a count after the function that you want to debug, just for
>>>> debug, but don't forget to remove this after testing.
>>>>
>>>>
>>>>
>>>> 2016-09-14 12:20 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>>
>>>>> Dear Dirceu,
>>>>>
>>>>>
>>>>> thanks you again.
>>>>>
>>>>>
>>>>> Actually,I never saw it stopped at the breakpoints no matter how long
>>>>> I wait.  It just skipped the whole anonymous function to
>>>>> direactly reach the first breakpoint immediately after the anonymous
>>>>> function body. Is that normal? I suspect sth wrong in my debugging
>>>>> operations or settings. I am very new to spark and  scala.
>>>>>
>>>>>
>>>>> Additionally, please give me some detailed instructions about  "....Some
>>>>> ides provide you a place where you can execute the code to see it's
>>>>> results....". where is the PLACE
>>>>>
>>>>>
>>>>> your help badly needed!
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>>>> *发送时间:* 2016年9月14日 23:07
>>>>> *收件人:* chen yong
>>>>> *主题:* Re: 答复: t it does not stop at breakpoints which is in an
>>>>> anonymous function
>>>>>
>>>>> You can call a count in the ide just to debug, or you can wait until
>>>>> it reaches the code, so you can debug.
>>>>> Some ides provide you a place where you can execute the code to see
>>>>> it's results.
>>>>> Be aware of not adding this operations in your production code,
>>>>> because they can slow down the execution of your code.
>>>>>
>>>>>
>>>>>
>>>>> 2016-09-14 11:43 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>>>
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> you mean i have to insert some codes, such as xxxxx.count or
>>>>>> xxxxx.collect, between the original spark code lines to invoke some
>>>>>> operations, right?
>>>>>> but, where is the right places to put my code lines?
>>>>>>
>>>>>> Felix
>>>>>>
>>>>>> ------------------------------
>>>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>>>>> *发送时间:* 2016年9月14日 22:33
>>>>>> *收件人:* chen yong
>>>>>> *抄送:* user@spark.apache.org
>>>>>> *主题:* Re: t it does not stop at breakpoints which is in an anonymous
>>>>>> function
>>>>>>
>>>>>> Hello Felix,
>>>>>> Spark functions run lazy, and that's why it doesn't stop in those
>>>>>> breakpoints.
>>>>>> They will be executed only when you call some methods of your
>>>>>> dataframe/rdd, like the count, collect, ...
>>>>>>
>>>>>> Regards,
>>>>>> Dirceu
>>>>>>
>>>>>> 2016-09-14 11:26 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am newbie to spark. I am learning spark by debugging the
>>>>>>> spark code. It is strange to me that it does not stop at breakpoints  
>>>>>>> which
>>>>>>> is in  an anonymous function, it is normal in ordianry function,
>>>>>>> though. It that normal. How to obverse variables in  an anonymous
>>>>>>> function.
>>>>>>>
>>>>>>>
>>>>>>> Please help me. Thanks in advance!
>>>>>>>
>>>>>>>
>>>>>>> Felix
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: 答复: 答复: it does not stop at breakpoints which is in an anonymous function

Reply via email to