Re: MLlib Spam example gets stuck in Stage X

Xiangrui Meng Fri, 20 Mar 2015 14:28:35 -0700

Su, which Spark version did you use? -Xiangrui

On Thu, Mar 19, 2015 at 3:49 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
> To get these metrics out, you need to open the driver ui running on port
> 4040. And in there you will see Stages information and for each stage you
> can see how much time it is spending on GC etc. In your case, the
> parallelism seems 4, the more # of parallelism the more # of tasks you will
> see.
>
> Thanks
> Best Regards
>
> On Thu, Mar 19, 2015 at 1:15 PM, Su She <suhsheka...@gmail.com> wrote:
>>
>> Hi Akhil,
>>
>> 1) How could I see how much time it is spending on stage 1? Or what if,
>> like above, it doesn't get past stage 1?
>>
>> 2) How could I check if its a GC time? and where would I increase the
>> parallelism for the model? I have a Spark Master and 2 Workers running on
>> CDH 5.3...what would the default spark-shell level of parallelism be...I
>> thought it would be 3?
>>
>> Thank you for the help!
>>
>> -Su
>>
>>
>> On Thu, Mar 19, 2015 at 12:32 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>>
>>> Can you see where exactly it is spending time? Like you said it goes to
>>> Stage 2, then you will be able to see how much time it spend on Stage 1. See
>>> if its a GC time, then try increasing the level of parallelism or
>>> repartition it like sc.getDefaultParallelism*3.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Thu, Mar 19, 2015 at 12:15 PM, Su She <suhsheka...@gmail.com> wrote:
>>>>
>>>> Hello Everyone,
>>>>
>>>> I am trying to run this MLlib example from Learning Spark:
>>>>
>>>> https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
>>>>
>>>> Things I'm doing differently:
>>>>
>>>> 1) Using spark shell instead of an application
>>>>
>>>> 2) instead of their spam.txt and normal.txt I have text files with 3700
>>>> and 2700 words...nothing huge at all and just plain text
>>>>
>>>> 3) I've used numFeatures = 100, 1000 and 10,000
>>>>
>>>> Error: I keep getting stuck when I try to run the model:
>>>>
>>>> val model = new LogisticRegressionWithSGD().run(trainingData)
>>>>
>>>> It will freeze on something like this:
>>>>
>>>> [Stage 1:==============>                                            (1 +
>>>> 0) / 4]
>>>>
>>>> Sometimes its Stage 1, 2 or 3.
>>>>
>>>> I am not sure what I am doing wrong...any help is much appreciated,
>>>> thank you!
>>>>
>>>> -Su
>>>>
>>>>
>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: MLlib Spam example gets stuck in Stage X

Reply via email to