Hi Akhil,

1) How could I see how much time it is spending on stage 1? Or what if,
like above, it doesn't get past stage 1?

2) How could I check if its a GC time? and where would I increase the
parallelism for the model? I have a Spark Master and 2 Workers running on
CDH 5.3...what would the default spark-shell level of parallelism be...I
thought it would be 3?

Thank you for the help!

-Su


On Thu, Mar 19, 2015 at 12:32 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Can you see where exactly it is spending time? Like you said it goes to
> Stage 2, then you will be able to see how much time it spend on Stage 1.
> See if its a GC time, then try increasing the level of parallelism or
> repartition it like sc.getDefaultParallelism*3.
>
> Thanks
> Best Regards
>
> On Thu, Mar 19, 2015 at 12:15 PM, Su She <suhsheka...@gmail.com> wrote:
>
>> Hello Everyone,
>>
>> I am trying to run this MLlib example from Learning Spark:
>>
>> https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
>>
>> Things I'm doing differently:
>>
>> 1) Using spark shell instead of an application
>>
>> 2) instead of their spam.txt and normal.txt I have text files with 3700
>> and 2700 words...nothing huge at all and just plain text
>>
>> 3) I've used numFeatures = 100, 1000 and 10,000
>>
>> *Error: *I keep getting stuck when I try to run the model:
>>
>> val model = new LogisticRegressionWithSGD().run(trainingData)
>>
>> It will freeze on something like this:
>>
>> [Stage 1:==============>                                            (1 +
>> 0) / 4]
>>
>> Sometimes its Stage 1, 2 or 3.
>>
>> I am not sure what I am doing wrong...any help is much appreciated, thank
>> you!
>>
>> -Su
>>
>>
>>
>

Reply via email to