This is becoming serious pain.
using powershell I am using spark-submit as follows:
PS C:\Users\admin> spark-submit.cmd
C:\Users\admin\PycharmProjects\pythonProject\main.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/D:/temp/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further
illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
20/12/03 23:13:59 INFO SparkContext: Running Spark version 3.0.1
20/12/03 23:13:59 INFO ResourceUtils:
==============================================================
20/12/03 23:13:59 INFO ResourceUtils: Resources for spark.driver:
20/12/03 23:13:59 INFO ResourceUtils:
==============================================================
20/12/03 23:13:59 INFO SparkContext: Submitted application: App1
20/12/03 23:13:59 INFO SecurityManager: Changing view acls to: admin
20/12/03 23:13:59 INFO SecurityManager: Changing modify acls to: admin
20/12/03 23:13:59 INFO SecurityManager: Changing view acls groups to:
20/12/03 23:13:59 INFO SecurityManager: Changing modify acls groups to:
20/12/03 23:13:59 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(admin); groups with view permissions: Set(); users
with modify permissions: Set(admin); groups with modify permissions: Set()
20/12/03 23:14:00 INFO Utils: Successfully started service
'sparkDriver' on port 62327.
20/12/03 23:14:00 INFO SparkEnv: Registering MapOutputTracker
20/12/03 23:14:00 INFO SparkEnv: Registering BlockManagerMaster
20/12/03 23:14:01 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
20/12/03 23:14:01 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
20/12/03 23:14:01 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/12/03 23:14:01 INFO DiskBlockManager: Created local directory at
C:\Users\admin\AppData\Local\Temp\blockmgr-30e2019a-af60-44da-86e7-8a162d1e29da
20/12/03 23:14:01 INFO MemoryStore: MemoryStore started with capacity
434.4 MiB
20/12/03 23:14:01 INFO SparkEnv: Registering OutputCommitCoordinator
20/12/03 23:14:01 INFO Utils: Successfully started service 'SparkUI'
on port 4040.
20/12/03 23:14:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started
at http://w7:4040 <http://w7:4040>
20/12/03 23:14:01 INFO Executor: Starting executor ID driver on host w7
20/12/03 23:14:01 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 62373.
20/12/03 23:14:01 INFO NettyBlockTransferService: Server created on
w7:62373
20/12/03 23:14:01 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block
replication policy
20/12/03 23:14:01 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, w7, 62373, None)
20/12/03 23:14:01 INFO BlockManagerMasterEndpoint: Registering block
manager w7:62373 with 434.4 MiB RAM, BlockManagerId(driver, w7, 62373,
None)
20/12/03 23:14:01 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, w7, 62373, None)
20/12/03 23:14:01 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, w7, 62373, None)
D:\temp\spark\python\lib\pyspark.zip\pyspark\context.py:225:
DeprecationWarning: Support for Python 2 and Python 3 prior to version
3.6 is deprecated as of Spark 3.0. See also the plan for dropping
Python 2 support at
https://spark.apache.org/news/plan-for-dropping-python-2-support.html
<https://spark.apache.org/news/plan-for-dropping-python-2-support.html>.
DeprecationWarning)
*20/12/03 23:14:02 INFO SharedState: loading hive config file:
file:/D:/temp/spark/conf/hive-site.xml*
*20/12/03 23:14:02 INFO SharedState: spark.sql.warehouse.dir is not
set, but hive.metastore.warehouse.dir is set. Setting
spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir
('C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse').*
*20/12/03 23:14:02 INFO SharedState: Warehouse path is
'C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse'.*
*20/12/03 23:14:04 INFO HiveConf: Found configuration file
file:/D:/temp/spark/conf/hive-site.xml*
*20/12/03 23:14:04 INFO HiveUtils: Initializing
HiveMetastoreConnection version 2.3.7 using Spark classes.*
*Traceback (most recent call last):*
* File "C:/Users/admin/PycharmProjects/pythonProject/main.py", line
79, in <module>*
*spark.sql("CREATE DATABASE IF NOT EXISTS test")*
*File "D:\temp\spark\python\lib\pyspark.zip\pyspark\sql\session.py",
line 649, in sql*
* File
"D:\temp\spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",
line 1305, in __call__*
* File "D:\temp\spark\python\lib\pyspark.zip\pyspark\sql\utils.py",
line 134, in deco*
* File "<string>", line 3, in raise_from*
*pyspark.sql.utils.AnalysisException: java.lang.UnsatisfiedLinkError:
org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V;*
20/12/03 23:14:04 INFO SparkContext: Invoking stop() from shutdown hook
20/12/03 23:14:04 INFO SparkUI: Stopped Spark web UI at http://w7:4040
<http://w7:4040>
20/12/03 23:14:04 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
20/12/03 23:14:04 INFO MemoryStore: MemoryStore cleared
20/12/03 23:14:04 INFO BlockManager: BlockManager stopped
20/12/03 23:14:04 INFO BlockManagerMaster: BlockManagerMaster stopped
20/12/03 23:14:04 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
20/12/03 23:14:04 INFO SparkContext: Successfully stopped SparkContext
20/12/03 23:14:04 INFO ShutdownHookManager: Shutdown hook called
20/12/03 23:14:04 INFO ShutdownHookManager: Deleting directory
C:\Users\admin\AppData\Local\Temp\spark-2ccc7f91-3970-42e4-b564-6621215dd446
20/12/03 23:14:04 INFO ShutdownHookManager: Deleting directory
C:\Users\admin\AppData\Local\Temp\spark-8015fc12-eff7-4d2e-b4c3-f864bf4b00ce\pyspark-12b6b74c-09a3-447f-be8b-b5aa26fa274d
20/12/03 23:14:04 INFO ShutdownHookManager: Deleting directory
C:\Users\admin\AppData\Local\Temp\spark-8015fc12-eff7-4d2e-b4c3-f864bf4b00ce
So basically it finds hive-site.xml under %SPARK_HOME%/conf directory.
Tries to initialise HiveMetastoreConnection but fails with error
pyspark.sql.utils.AnalysisException: java.lang.UnsatisfiedLinkError:
org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V;
winutils.exe is put under %SPARK_HOME%/bin directory
where winutils.exe
D:\temp\spark\bin\winutils.exe
and permissions chmod -R 777 is set
Also this is hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.exec.local.scratchdir</name>
<value>C:\Users\admin\PycharmProjects\pythonProject\hive-localscratchdir</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>C:\Users\admin\PycharmProjects\pythonProject\hive-scratchdir</value>
<description>HDFS root scratch dir for Hive jobs which gets created
with write all (733) permission. For each connecting user, an HDFS
scratch dir: ${hive.exec.scratchdir}/<username> is created, with
${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>spark.sql.warehouse.dir</name>
<value>C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>d:\temp\hive\</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:C:\Users\admin\PycharmProjects\pythonProject\metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
</configuration>
LinkedIn
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>/
*Disclaimer:* Use it at your own risk.Any and all responsibility for
any loss, damage or destruction of data or any other property which
may arise from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
On Wed, 2 Dec 2020 at 23:11, Artemis User <arte...@dtechspace.com
<mailto:arte...@dtechspace.com>> wrote:
Apparently this is a OS dynamic lib link error. Make sure you
have the LD_LIBRARY_PATH (in Linux) or PATH (windows) set up
properly for the right .so or .dll file...
On 12/2/20 5:31 PM, Mich Talebzadeh wrote:
Hi,
I have a simple code that tries to create Hive derby database as
follows:
from pysparkimport SparkContext
from pyspark.sqlimport SQLContext
from pyspark.sqlimport HiveContext
from pyspark.sqlimport SparkSession
from pyspark.sqlimport Row
from pyspark.sql.typesimport StringType, ArrayType
from pyspark.sql.functionsimport udf, col, maxas max, to_date, date_add, \
add_months
from datetimeimport datetime, timedelta
import os
from os.pathimport join, abspath
from typingimport Optional
import logging
import random
import string
import math
warehouseLocation
='c:\\Users\\admin\\PycharmProjects\\pythonProject\\spark-warehouse'
local_scrtatchdir
='c:\\Users\\admin\\PycharmProjects\\pythonProject\\hive-localscratchdir'
scrtatchdir
='c:\\Users\\admin\\PycharmProjects\\pythonProject\\hive-scratchdir' tmp_dir
='d:\\temp\\hive' metastore_db
='jdbc:derby:C:\\Users\\admin\\PycharmProjects\\pythonProject\\metastore_db;create=true'
ConnectionDriverName ='org.apache.derby.EmbeddedDriver' spark =
SparkSession \
.builder \
.appName("App1") \
.config("hive.exec.local.scratchdir", local_scrtatchdir) \
.config("hive.exec.scratchdir", scrtatchdir) \
.config("spark.sql.warehouse.dir", warehouseLocation) \
.config("hadoop.tmp.dir", tmp_dir) \
.config("javax.jdo.option.ConnectionURL", metastore_db ) \
.config("javax.jdo.option.ConnectionDriverName", ConnectionDriverName)
\
.enableHiveSupport() \
.getOrCreate()
print(os.listdir(warehouseLocation))
print(os.listdir(local_scrtatchdir))
print(os.listdir(scrtatchdir))
print(os.listdir(tmp_dir))
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
HiveContext = HiveContext(sc)
spark.sql("CREATE DATABASE IF NOT EXISTS test")
Now this comes back with the following:
C:\Users\admin\PycharmProjects\pythonProject\venv\Scripts\python.exe
C:/Users/admin/PycharmProjects/pythonProject/main.py
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
use setLogLevel(newLevel).
[]
[]
[]
['hive-localscratchdir', 'hive-scratchdir', 'hive-warehouse']
Traceback (most recent call last):
File "C:/Users/admin/PycharmProjects/pythonProject/main.py",
line 76, in <module>
spark.sql("CREATE DATABASE IF NOT EXISTS test")
File "D:\temp\spark\python\pyspark\sql\session.py", line 649,
in sql
return DataFrame(self._jsparkSession.sql(sqlQuery),
self._wrapped)
File
"D:\temp\spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",
line 1305, in __call__
File "D:\temp\spark\python\pyspark\sql\utils.py", line 134, in deco
raise_from(converted)
File "<string>", line 3, in raise_from
*pyspark.sql.utils.AnalysisException:
java.lang.UnsatisfiedLinkError:
org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V;*
Process finished with exit code 1
Also under %SPARK_HOME%/conf I also have hive-site.xml file. It
is not obvious to me why it is throwing this error?
Thanks
LinkedIn
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>/
*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other property
which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for
any monetary damages arising from such loss, damage or destruction.