EBernhardson added a comment.

  Unfortunately the above patch doesn't seem to have worked. Spark turned the 
input into three tasks. They were all assigned to the same executor, the first 
two finished and the third caused the container to die after another ~45s due 
to memory constraints. Spark then spun up a new executor which was only ever 
assigned that one task, and it failed for same reason.
  
  My next guess was to try tuning spark.sql.files.maxPartitionBytes, 
documentated as `The maximum number of bytes to pack into a single partition 
when reading files.` Unfortunately while spark did make some extra partitions, 
12 instead of 3, all the extra partitions were empty.  I glanced over the other 
spark configuration related to reads and partitioning but I'm not seeing other 
knobs we can turn in that direction.
  
  Next guess is brute force, add some memory overhead until it stops 
complaining. The actual jvm heap doesn't seem to be overloaded, or at least the 
GC times prior to getting killed don't look concerning. We should be able to 
leave the heap at the current size. Tried 4g overhead, still failed. Tried 8g 
overhead, it still killed a task but with retries managed to finish. I'm not 
too thrilled to run everything with the 8g overhead, but we could go that way 
if we have to.

TASK DETAIL
  https://phabricator.wikimedia.org/T347333

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, EBernhardson
Cc: EBernhardson, bking, Aklapper, dcausse, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to