[ https://issues.apache.org/jira/browse/SPARK-29767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro resolved SPARK-29767. -------------------------------------- Resolution: Cannot Reproduce > Core dump happening on executors while doing simple union of Data Frames > ------------------------------------------------------------------------ > > Key: SPARK-29767 > URL: https://issues.apache.org/jira/browse/SPARK-29767 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core > Affects Versions: 2.4.4 > Environment: AWS EMR 5.27.0, Spark 2.4.4 > Reporter: Udit Mehrotra > Priority: Major > Attachments: coredump.zip, hs_err_pid13885.log, > part-00000-0189b5c2-7f7b-4d0e-bdb8-506380253597-c000.snappy.parquet, test.py > > > Running a union operation on two DataFrames through both Scala Spark Shell > and PySpark, resulting in executor contains doing a *core dump* and existing > with Exit code 134. > The trace from the *Driver*: > {noformat} > Container exited with a non-zero exit code 134 > . > 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 > (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure > (executor 11 exited caused by one of the running tasks) Reason: Container > from a bad node: container_1572981097605_0021_01_000077 on host: > ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from > container-launch. > Container id: container_1572981097605_0021_01_000077 > Exit code: 134 > Exception message: /bin/bash: line 1: 12611 Aborted > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stderrStack > trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted > > LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" > /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' > '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' > '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' > -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/tmp > '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 > --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id > application_1572981097605_0021 --user-class-path > file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/__app__.jar > > > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stdout > 2> > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stderr > at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) > at org.apache.hadoop.util.Shell.run(Shell.java:869) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Container exited with a non-zero exit code 134{noformat} > From the *stdout* logs of the exiting container we see: > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f825e3b0e92, pid=12611, tid=0x00007f822b5fb700 > # > # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10) > # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # V [libjvm.so+0xa9ae92] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # > /mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/hs_err_pid12611.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > #{noformat} > Also, I am unable to enable *core dump* even though *ulimit -c* is set to > *unlimited*. Can you help on how to go about this issue, and also how to get > the *core dump* ? > Steps to reproduce the issue: > * Upload the attached parquet data file to S3 > *s3://<bucket>/tables/spark_29767_parquet_table/inserted_at=201910/* > * Create a partitioned hive table > {code:java} > CREATE EXTERNAL TABLE `spark_29767_parquet_table`( > `hour` bigint, > `title` string, > `__deleted` string, > `status` string, > `transformationid` string, > `roomid` string, > `day` bigint, > `notes` string, > `nunitsfromaudit` bigint, > `ts_ms` bigint, > `liability` string, > `_class` string, > `month` bigint, > `updatedate` struct<`date`:bigint>, > `_id` struct<oid:string>, > `year` bigint, > `item` > struct<name:string,brandname:string,perunitpricefromaudit:struct<currency:string,amount:string>,actualPerUnitPrice:struct<currency:string,amount:string>,category:string,itemType:string,roomAmenityId:bigint>, > > `createddate` struct<`date`:bigint>, > `actualunits` bigint, > `description` string) > PARTITIONED BY ( > `inserted_at` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 's3://<bucket>/tables/spark_29767_parquet_table' > {code} > * Sync partition > {code:java} > ALTER TABLE spark_29767_parquet_table ADD PARTITION (inserted_at='201910') > location 's3://<bucket>/tables/spark_29767_parquet_table/inserted_at=201910/' > {code} > * In pyspark run the following: > {code:java} > // Read the base data frame > from pyspark import SparkContext, SparkConf > from pyspark.sql import SparkSession, HiveContext > from pyspark.sql.functions import lit > sparkSession = (SparkSession > .builder > .appName('example-pyspark-read-and-write-from-hive') > .enableHiveSupport() > .getOrCreate())base_df = > sparkSession.table("spark_29767_parquet_table") > base_df = sparkSession.table("spark_29767_parquet_table") > base_df = base_df.select("_id", "_class", "roomid", "item", "inserted_at") > // Create a new dataframe with one row for union > from pyspark.sql import * > import pyspark.sql.types > from pyspark.sql.types import * > schema = StructType([ > StructField("_id",StructType([StructField("oid",StringType(),True)]),True), > StructField("_class",StringType(),True), > StructField("roomid",StringType(),True), > StructField("item",StructType([ > StructField("name",StringType(),True), > StructField("brandname",StringType(),True), > StructField("perunitpricefromaudit", > StructType([ > StructField("currency",StringType(),True), > StructField("amount",StringType(),True)]),True), > StructField("actualperunitprice",StructType([ > StructField("currency",StringType(),True), > StructField("amount",StringType(),True)]),True), > StructField("category",StringType(),True), > StructField("itemtype",StringType(),True), > StructField("roomamenityid",LongType(),True)]),True), > StructField("inserted_at",StringType(),True)]) > data = [ > Row(Row("5daff5ca43b8a36756c23b0f"), > "com.oyo.transformations.tasks.model.implementations.AuditItemTaskImpl", > None, > Row("Geyser Installation(with accessories)",None,Row("INR", > "425.0"),None,"INFRASTRUCTURE","PMC",None), > "201910" > ) > ] > inc_df = spark.createDataFrame( > spark.sparkContext.parallelize(data), > schema > ) > inc_df.union(base_df).show() > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org