Re: MLlib Spam example gets stuck in Stage X

Su She Fri, 20 Mar 2015 15:55:21 -0700

Hello Xiangrui,

I use spark 1.2.0 on cdh 5.3. Thanks!


-Su


On Fri, Mar 20, 2015 at 2:27 PM Xiangrui Meng <men...@gmail.com> wrote:

> Su, which Spark version did you use? -Xiangrui
>
> On Thu, Mar 19, 2015 at 3:49 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
> > To get these metrics out, you need to open the driver ui running on port
> > 4040. And in there you will see Stages information and for each stage you
> > can see how much time it is spending on GC etc. In your case, the
> > parallelism seems 4, the more # of parallelism the more # of tasks you
> will
> > see.
> >
> > Thanks
> > Best Regards
> >
> > On Thu, Mar 19, 2015 at 1:15 PM, Su She <suhsheka...@gmail.com> wrote:
> >>
> >> Hi Akhil,
> >>
> >> 1) How could I see how much time it is spending on stage 1? Or what if,
> >> like above, it doesn't get past stage 1?
> >>
> >> 2) How could I check if its a GC time? and where would I increase the
> >> parallelism for the model? I have a Spark Master and 2 Workers running
> on
> >> CDH 5.3...what would the default spark-shell level of parallelism be...I
> >> thought it would be 3?
> >>
> >> Thank you for the help!
> >>
> >> -Su
> >>
> >>
> >> On Thu, Mar 19, 2015 at 12:32 AM, Akhil Das <ak...@sigmoidanalytics.com
> >
> >> wrote:
> >>>
> >>> Can you see where exactly it is spending time? Like you said it goes to
> >>> Stage 2, then you will be able to see how much time it spend on Stage
> 1. See
> >>> if its a GC time, then try increasing the level of parallelism or
> >>> repartition it like sc.getDefaultParallelism*3.
> >>>
> >>> Thanks
> >>> Best Regards
> >>>
> >>> On Thu, Mar 19, 2015 at 12:15 PM, Su She <suhsheka...@gmail.com>
> wrote:
> >>>>
> >>>> Hello Everyone,
> >>>>
> >>>> I am trying to run this MLlib example from Learning Spark:
> >>>>
> >>>> https://github.com/databricks/learning-spark/blob/master/src
> /main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
> >>>>
> >>>> Things I'm doing differently:
> >>>>
> >>>> 1) Using spark shell instead of an application
> >>>>
> >>>> 2) instead of their spam.txt and normal.txt I have text files with
> 3700
> >>>> and 2700 words...nothing huge at all and just plain text
> >>>>
> >>>> 3) I've used numFeatures = 100, 1000 and 10,000
> >>>>
> >>>> Error: I keep getting stuck when I try to run the model:
> >>>>
> >>>> val model = new LogisticRegressionWithSGD().run(trainingData)
> >>>>
> >>>> It will freeze on something like this:
> >>>>
> >>>> [Stage 1:==============>
> (1 +
> >>>> 0) / 4]
> >>>>
> >>>> Sometimes its Stage 1, 2 or 3.
> >>>>
> >>>> I am not sure what I am doing wrong...any help is much appreciated,
> >>>> thank you!
> >>>>
> >>>> -Su
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: MLlib Spam example gets stuck in Stage X

Reply via email to