{main_host}:{cluster_port} failing

Mich Talebzadeh Wed, 23 Aug 2023 14:18:50 -0700

Hi Jeremy,

This error concerns me


"23/08/23 20:01:03 ERROR LevelDBProvider: error opening leveldb file
file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb.
Creating new file, will not be able to recover state for existing
applications org.fusesource.leveldbjni.internal.NativeDB$DBException: IO
error: /opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_
ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK: No such file or
directory"

This is a standalone mode you are running in your EC2? Has this been run
before OK? I have not used standalone mode for a long time :(

The path mentioned in the error
/opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK
seems dodgy so to speak , especially with the presence of /sbin/file: which
could be due a possible misconfiguration or incorrect file path
concatenation.

First aid:

   - Check Path Configuration: Ensure that the path to the LevelDB file is
   correctly configured in your configuration settings. Logically, it should
   point to a valid directory path, not including unexpected prefixes like
   /sbin/file:.
   - Can user running spark have sufficient permissions to access and
   modify files in the specified directory?
   - Check if the directory
   /mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb exists and
   if it contains the required LevelDB files. If the directory or files are
   missing, this could be a reason for the error


HTH for now


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 23 Aug 2023 at 21:39, Jeremy Brent <j.br...@ieee.org.invalid> wrote:

> Hi Spark Community,
>
> We have a cluster running with Spark 3.3.1. All nodes are AWS EC2’s with
> an Ubuntu OS version 22.04.
>
> One of the workers disconnected from the main node. When we run
> $SPARK_HOME/sbin/start-worker.sh spark://{main_host}:{cluster_port} it
> appears to run successfully; there is no stderr when we run the
> aforementioned command. Stdout returns the following:
> starting org.apache.spark.deploy.worker.Worker, logging to
> /opt/spark/spark-3.3.1-bin-hadoop3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ri-worker-1.out
>
> However, when we look at the log file defined in stdout, we see the
> following:
> Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp
> /opt/spark/spark-3.3.1-bin-hadoop3/conf/:/opt/spark/spark-3.3.1-bin-hadoop3/jars/*
> -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://
> 10.113.62.58:7077
> ========================================
> Using Spark's default log4j profile:
> org/apache/spark/log4j2-defaults.properties
> 23/08/23 20:01:02 INFO Worker: Started daemon with process name:
> 940835@ri-worker-1
> 23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for TERM
> 23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for HUP
> 23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for INT
> 23/08/23 20:01:03 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 23/08/23 20:01:03 INFO SecurityManager: Changing view acls to: root
> 23/08/23 20:01:03 INFO SecurityManager: Changing modify acls to: root
> 23/08/23 20:01:03 INFO SecurityManager: Changing view acls groups to:
> 23/08/23 20:01:03 INFO SecurityManager: Changing modify acls groups to:
> 23/08/23 20:01:03 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(root); groups
> with view permissions: Set(); users  with modify permissions: Set(root);
> groups with modify permissions: Set()
> 23/08/23 20:01:03 INFO Utils: Successfully started service 'sparkWorker'
> on port 43757.
> 23/08/23 20:01:03 INFO Worker: Worker decommissioning not enabled.
> 23/08/23 20:01:03 ERROR LevelDBProvider: error opening leveldb file
> file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb.
> Creating new file, will not be able to recover state for existing
> applications
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error:
> /opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK:
> No such file or directory
>         at
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at
> org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at
> org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:48)
>         at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:126)
>         at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:99)
>         at
> org.apache.spark.network.shuffle.ExternalBlockHandler.<init>(ExternalBlockHandler.java:81)
>         at
> org.apache.spark.deploy.ExternalShuffleService.newShuffleBlockHandler(ExternalShuffleService.scala:82)
>         at
> org.apache.spark.deploy.ExternalShuffleService.<init>(ExternalShuffleService.scala:56)
>         at org.apache.spark.deploy.worker.Worker.<init>(Worker.scala:183)
>         at
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:966)
>         at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:934)
>         at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
> 23/08/23 20:01:03 WARN LevelDBProvider: error deleting
> file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb
> 23/08/23 20:01:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[main,5,main]
> java.io.IOException: Unable to create state store
>         at
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:77)
>         at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:126)
>         at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.<init>(ExternalShuffleBlockResolver.java:99)
>         at
> org.apache.spark.network.shuffle.ExternalBlockHandler.<init>(ExternalBlockHandler.java:81)
>         at
> org.apache.spark.deploy.ExternalShuffleService.newShuffleBlockHandler(ExternalShuffleService.scala:82)
>         at
> org.apache.spark.deploy.ExternalShuffleService.<init>(ExternalShuffleService.scala:56)
>         at org.apache.spark.deploy.worker.Worker.<init>(Worker.scala:183)
>         at
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:966)
>         at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:934)
>         at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO
> error:
> /opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK:
> No such file or directory
>         at
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at
> org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at
> org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:75)
>         ... 9 more
> 23/08/23 20:01:03 INFO ShutdownHookManager: Shutdown hook called
>
> When we check the Spark Master Server, the worker is not in our worker
> list.
>
> Has anyone seen the error shown in the stacktrace above / know a solution
> to fix this problem?
>
> All the best,
> --
> Jeremy Brent
> Product Engineering Data Scientist
> Data Intelligence & Machine Learning
> Office: 732-562-6030
> Cell:  732-336-0499
>

Re: $SPARK_HOME/sbin/start-worker.sh spark://{main_host}:{cluster_port} failing

Reply via email to