Hello Xiangrui, I use spark 1.2.0 on cdh 5.3. Thanks!
-Su On Fri, Mar 20, 2015 at 2:27 PM Xiangrui Meng <men...@gmail.com> wrote: > Su, which Spark version did you use? -Xiangrui > > On Thu, Mar 19, 2015 at 3:49 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > > To get these metrics out, you need to open the driver ui running on port > > 4040. And in there you will see Stages information and for each stage you > > can see how much time it is spending on GC etc. In your case, the > > parallelism seems 4, the more # of parallelism the more # of tasks you > will > > see. > > > > Thanks > > Best Regards > > > > On Thu, Mar 19, 2015 at 1:15 PM, Su She <suhsheka...@gmail.com> wrote: > >> > >> Hi Akhil, > >> > >> 1) How could I see how much time it is spending on stage 1? Or what if, > >> like above, it doesn't get past stage 1? > >> > >> 2) How could I check if its a GC time? and where would I increase the > >> parallelism for the model? I have a Spark Master and 2 Workers running > on > >> CDH 5.3...what would the default spark-shell level of parallelism be...I > >> thought it would be 3? > >> > >> Thank you for the help! > >> > >> -Su > >> > >> > >> On Thu, Mar 19, 2015 at 12:32 AM, Akhil Das <ak...@sigmoidanalytics.com > > > >> wrote: > >>> > >>> Can you see where exactly it is spending time? Like you said it goes to > >>> Stage 2, then you will be able to see how much time it spend on Stage > 1. See > >>> if its a GC time, then try increasing the level of parallelism or > >>> repartition it like sc.getDefaultParallelism*3. > >>> > >>> Thanks > >>> Best Regards > >>> > >>> On Thu, Mar 19, 2015 at 12:15 PM, Su She <suhsheka...@gmail.com> > wrote: > >>>> > >>>> Hello Everyone, > >>>> > >>>> I am trying to run this MLlib example from Learning Spark: > >>>> > >>>> https://github.com/databricks/learning-spark/blob/master/src > /main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48 > >>>> > >>>> Things I'm doing differently: > >>>> > >>>> 1) Using spark shell instead of an application > >>>> > >>>> 2) instead of their spam.txt and normal.txt I have text files with > 3700 > >>>> and 2700 words...nothing huge at all and just plain text > >>>> > >>>> 3) I've used numFeatures = 100, 1000 and 10,000 > >>>> > >>>> Error: I keep getting stuck when I try to run the model: > >>>> > >>>> val model = new LogisticRegressionWithSGD().run(trainingData) > >>>> > >>>> It will freeze on something like this: > >>>> > >>>> [Stage 1:==============> > (1 + > >>>> 0) / 4] > >>>> > >>>> Sometimes its Stage 1, 2 or 3. > >>>> > >>>> I am not sure what I am doing wrong...any help is much appreciated, > >>>> thank you! > >>>> > >>>> -Su > >>>> > >>>> > >>> > >> > > >