Hi, Need a help to figure out and solve heap space problem.
I have query which contains 15+ table and when i trying to print out the result(Just 23 rows) it throws heap space error. Following command i tried in standalone mode: (My mac machine having 8 core and 15GB ram) spark.conf().set("spark.sql.shuffle.partitions", 20); ./spark-submit --master spark://selva:7077 --executor-memory 2g --total-executor-cores 4 --class MemIssue --conf 'spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+UseG1GC -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark' /Users/rs/Desktop/test.jar This is my below query: select concat(sf1.scode, ''-'', m.mcode, ''-'', rf.rnum) , sf1.scode , concat(p.lname,'', '',ci.pyear), at.atext Alias, m.mcode Method, mt.mcode, v.vname, nd.vmeas " + " from result r " + " join var v on v.vnum = r.vnum " + " join numa nd on nd.rnum = r.num " + " join feat fa on fa.fnum = r.fnum " + " join samp sf1 on sf1.snum = fa.snum " + " join spe sp on sf1.snum = sp.snum and sp.mnum not in (1,2)" + " join act a on a.anum = fa.anum " + " join met m on m.mnum = a.mnum " + " join sampl sfa on sfa.snum = sf1.snum " + " join ann at on at.anum = sfa.anum AND at.atypenum = 11 " + " join data dr on r.rnum = dr.rnum " + " join cit cd on dr.dnum = cd.dnum " + " join cit on cd.cnum = ci.cnum " + " join aut al on ci.cnum = al.cnum and al.aorder = 1 " + " join per p on al.pnum = p.pnum " + " left join rel rf on sf1.snum = rf.snum " + " left join samp sf2 on rf.rnum = sf2.snum " + " left join spe s on s.snum = sf1.snum " + " left join mat mt on mt.mnum = s.mnum " + " where sf1.sampling_feature_code = '1234test''" + " order by 1,2 spark.sql(query).show When i checked wholstagecode, first it reads all data from the table. Why it is reading all the data from table and doing sort merge join for 3 or 4 tables. Why it is not applying any filtering value. Though i have given large memory for executor it is still throws the same error. when spark sql do the joining how it is utilizing memory and cores. Any guidelines would be greatly welcome. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"