Re: Logistic Regression Iterations causing High GC in Spark 2.3

Sean Owen Mon, 29 Jul 2019 06:15:35 -0700

-dev@

Yep, high GC activity means '(almost) out of memory'. I don't see that
you've checked heap usage - is it nearly full?
The answer isn't tuning but more heap.
(Sometimes with really big heaps the problem is big pauses, but that's
not the case here.)


On Mon, Jul 29, 2019 at 1:26 AM Dhrubajyoti Hati <dhruba.w...@gmail.com> wrote:
>
> Hi,
>
> We were running Logistic Regression in Spark 2.2.X and then we tried to see 
> how does it do in Spark 2.3.X. Now we are facing an issue while running a 
> Logistic Regression Model in Spark 2.3.X on top of Yarn(GCP-Dataproc). In the 
> TreeAggregate method it takes a huge time due to very High GC Activity. I 
> have tuned the GC, created different sized clusters, higher spark 
> version(2.4.X), smaller data but nothing helps. The GC time is 100 - 1000 
> times of the processing time in avg for iterations.
>
> The strange part is in Spark 2.2 this doesn't happen at all. Same code, same 
> cluster sizing, same data in both the cases.
>
> I was wondering if someone can explain this behaviour and help me to resolve 
> this. How can the same code has so different behaviour in two Spark version, 
> especially the higher ones?
>
> Here are the config which I used:
>
>
> spark.serializer=org.apache.spark.serializer.KryoSerializer
>
> #GC Tuning
>
> spark.executor.extraJavaOptions= -XX:+UseG1GC -XX:+PrintFlagsFinal 
> -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
> -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions 
> -XX:+G1SummarizeConcMark -Xms9000m -XX:ParallelGCThreads=20 
> -XX:ConcGCThreads=5
>
>
> spark.executor.instances=20
>
> spark.executor.cores=1
>
> spark.executor.memory=9010m
>
>
>
> Regards,
> Dhrub
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Logistic Regression Iterations causing High GC in Spark 2.3

Reply via email to