Can you see where exactly it is spending time? Like you said it goes to
Stage 2, then you will be able to see how much time it spend on Stage 1.
See if its a GC time, then try increasing the level of parallelism or
repartition it like sc.getDefaultParallelism*3.

Thanks
Best Regards

On Thu, Mar 19, 2015 at 12:15 PM, Su She <suhsheka...@gmail.com> wrote:

> Hello Everyone,
>
> I am trying to run this MLlib example from Learning Spark:
>
> https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
>
> Things I'm doing differently:
>
> 1) Using spark shell instead of an application
>
> 2) instead of their spam.txt and normal.txt I have text files with 3700
> and 2700 words...nothing huge at all and just plain text
>
> 3) I've used numFeatures = 100, 1000 and 10,000
>
> *Error: *I keep getting stuck when I try to run the model:
>
> val model = new LogisticRegressionWithSGD().run(trainingData)
>
> It will freeze on something like this:
>
> [Stage 1:==============>                                            (1 +
> 0) / 4]
>
> Sometimes its Stage 1, 2 or 3.
>
> I am not sure what I am doing wrong...any help is much appreciated, thank
> you!
>
> -Su
>
>
>

Reply via email to