[image: Mic Drop] Hi, I have Hadoop Hortonworks 3 NODE Cluster on EC2 with *Hadoop *version 2.7.x *Spark *version - 1.5.2 *Phoenix *version - 4.4 *Hbase *version 1.1.x
*Cluster Statistics * Date Node 1 OS: redhat7 (x86_64)Cores (CPU): 2 (2)Disk: 20.69GB/99.99GB (20.69% used) Memory: 7.39GB Date Node 2 Cores (CPU): 2 (2)Disk: 20.73GB/99.99GB (20.73% used)Memory: 7.39GBLoad Avg: 0.00Heartbeat: a moment agoCurrent Version: 2.3.4.0-3485*NameNode*Rack: /default-rack OS: redhat7 (x86_64)Cores (CPU): 4 (4)Disk: 32.4GB/99.99GB (32.4% used)Memory: 15.26GBLoad Avg: 0.78Heartbeat: a moment agoCurrent Version: 2.3.4.0-3485 *Spark Queue Statistics * > Queue State: RUNNING > Used Capacity: 0.0% > Configured Capacity: 100.0% > Configured Max Capacity: 100.0% > Absolute Used Capacity: 0.0% > Absolute Configured Capacity: 100.0% > Absolute Configured Max Capacity: 100.0% > Used Resources: <memory:0, vCores:0> > Num Schedulable Applications: 0 > Num Non-Schedulable Applications: 0 > Num Containers: 0 > Max Applications: 10000 > Max Applications Per User: 10000 > Max Application Master Resources: <memory:3072, vCores:1> > Used Application Master Resources: <memory:0, vCores:0> > Max Application Master Resources Per User: <memory:3072, vCores:1> > Configured Minimum User Limit Percent: 100% > Configured User Limit Factor: 1.0 > Accessible Node Labels: * > Ordering Policy: FifoOrderingPolicy > Preemption: disabled I have spark scala script which is doing many operations like reading from DB(Phoenix),Join-Inner,LeftOuter join),unionAll and finally groupBy and saving the result set to Phoenix/HDFS Have created almost 20+ Dataframes for mentioned above operations. stackTrace : > 16/04/01 10:11:49 WARN TaskSetManager: Lost task 3.0 in stage 132.4 (TID > 18401, ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal): > java.lang.OutOfMemoryError: PermGen space > at sun.misc.Unsafe.defineClass(Native Method) > at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395) > at > sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113) > at > sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331) > at > java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376) > at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) For Phoenix : I am getting similar to below error in my stack trace > > $provider.DefaultSource does not allow user-specified schemas The whole job is taking almost 3-4 minutes and for saving itself its taking 3-4 minutes whether it is Phoenix /HDFS Could somebody help me resolving the above mentioned issue. Would really appreciate the help. Thanks, Divya