Hi,

Need a help to figure out and solve heap space problem.

I have query which contains 15+ table and when i trying to print out the
result(Just 23 rows) it throws heap space error.

Following command i tried in standalone mode:
(My mac machine having 8 core and 15GB ram)

spark.conf().set("spark.sql.shuffle.partitions", 20);

./spark-submit --master spark://selva:7077 --executor-memory 2g
--total-executor-cores 4 --class MemIssue --conf
'spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+UseG1GC
-XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark'
/Users/rs/Desktop/test.jar

This is my below query:

select concat(sf1.scode, ''-'', m.mcode, ''-'', rf.rnum) , sf1.scode ,
concat(p.lname,'', '',ci.pyear), at.atext Alias, m.mcode Method, mt.mcode,
v.vname, nd.vmeas " +

            " from  result r " +

            "  join  var v on v.vnum = r.vnum " +

            "  join  numa nd on nd.rnum = r.num " +

            "  join  feat  fa on fa.fnum = r.fnum " +

            "  join  samp  sf1 on sf1.snum = fa.snum " +

            "  join  spe  sp on sf1.snum = sp.snum and sp.mnum not in
(1,2)" +

            "  join  act  a on a.anum = fa.anum " +

            "  join  met  m on m.mnum = a.mnum " +

            "  join  sampl  sfa on sfa.snum = sf1.snum " +

            "  join  ann  at on at.anum = sfa.anum AND at.atypenum = 11 " +

            "  join  data  dr on r.rnum = dr.rnum " +

            "  join  cit  cd on dr.dnum = cd.dnum " +

            "  join  cit  on cd.cnum = ci.cnum " +

            "  join  aut  al on ci.cnum = al.cnum and al.aorder = 1 " +

            "  join  per  p on al.pnum = p.pnum " +

            "  left join  rel  rf on sf1.snum = rf.snum " +

            "  left join  samp sf2 on rf.rnum = sf2.snum " +

            "  left join  spe  s on s.snum = sf1.snum " +

            "  left join  mat  mt on mt.mnum = s.mnum " +

            " where sf1.sampling_feature_code = '1234test''" +

            " order by 1,2


spark.sql(query).show


When i checked wholstagecode, first it reads all data from the table. Why
it is reading all the data from table and doing sort merge join for 3 or 4
tables. Why it is not applying any filtering value.


Though i have given large memory for executor it is still throws the same
error. when spark sql do the joining how it is utilizing memory and cores.

Any guidelines would be greatly welcome.
-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Reply via email to