Re: In windows 10, accessing Hive from PySpark with PyCharm throws error

Artemis User Thu, 03 Dec 2020 20:49:19 -0800

You don't have to include all your config and log messages. The errormessage would suffice. The java.lang.UnsatisfiedLinkError exceptionindicates that the JVM can't find some OS-specific libraries (orcommonly referred as native libraries). On Windows, they would be somedll files. Look into your Hadoop installation and you will find the$HADOOPHOME/lib/native directory. All the OS-specific library files arethere (on Windows, this lib path may be different). So add this path toyour PATH environmental variable in your command shell before runningspark-submit again.


-- ND


On 12/3/20 6:28 PM, Mich Talebzadeh wrote:

This is becoming serious pain.

using powershell I am using spark-submit as follows:
PS C:\Users\admin> spark-submit.cmdC:\Users\admin\PycharmProjects\pythonProject\main.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform(file:/D:/temp/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructorjava.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers oforg.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of furtherillegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile:org/apache/spark/log4j-defaults.properties
20/12/03 23:13:59 INFO SparkContext: Running Spark version 3.0.1
20/12/03 23:13:59 INFO ResourceUtils:==============================================================
20/12/03 23:13:59 INFO ResourceUtils: Resources for spark.driver:
20/12/03 23:13:59 INFO ResourceUtils:==============================================================
20/12/03 23:13:59 INFO SparkContext: Submitted application: App1

20/12/03 23:13:59 INFO SecurityManager: Changing view acls to: admin

20/12/03 23:13:59 INFO SecurityManager: Changing modify acls to: admin

20/12/03 23:13:59 INFO SecurityManager: Changing view acls groups to:

20/12/03 23:13:59 INFO SecurityManager: Changing modify acls groups to:
20/12/03 23:13:59 INFO SecurityManager: SecurityManager:authentication disabled; ui acls disabled; users with viewpermissions: Set(admin); groups with view permissions: Set(); users with modify permissions: Set(admin); groups with modify permissions: Set()
20/12/03 23:14:00 INFO Utils: Successfully started service'sparkDriver' on port 62327.
20/12/03 23:14:00 INFO SparkEnv: Registering MapOutputTracker

20/12/03 23:14:00 INFO SparkEnv: Registering BlockManagerMaster
20/12/03 23:14:01 INFO BlockManagerMasterEndpoint: Usingorg.apache.spark.storage.DefaultTopologyMapper for getting topologyinformation
20/12/03 23:14:01 INFO BlockManagerMasterEndpoint:BlockManagerMasterEndpoint up
20/12/03 23:14:01 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/12/03 23:14:01 INFO DiskBlockManager: Created local directory atC:\Users\admin\AppData\Local\Temp\blockmgr-30e2019a-af60-44da-86e7-8a162d1e29da
20/12/03 23:14:01 INFO MemoryStore: MemoryStore started with capacity434.4 MiB
20/12/03 23:14:01 INFO SparkEnv: Registering OutputCommitCoordinator
20/12/03 23:14:01 INFO Utils: Successfully started service 'SparkUI'on port 4040.
20/12/03 23:14:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and startedat http://w7:4040 <http://w7:4040>
20/12/03 23:14:01 INFO Executor: Starting executor ID driver on host w7
20/12/03 23:14:01 INFO Utils: Successfully started service'org.apache.spark.network.netty.NettyBlockTransferService' on port 62373.
20/12/03 23:14:01 INFO NettyBlockTransferService: Server created onw7:62373
20/12/03 23:14:01 INFO BlockManager: Usingorg.apache.spark.storage.RandomBlockReplicationPolicy for blockreplication policy
20/12/03 23:14:01 INFO BlockManagerMaster: Registering BlockManagerBlockManagerId(driver, w7, 62373, None)
20/12/03 23:14:01 INFO BlockManagerMasterEndpoint: Registering blockmanager w7:62373 with 434.4 MiB RAM, BlockManagerId(driver, w7, 62373,None)
20/12/03 23:14:01 INFO BlockManagerMaster: Registered BlockManagerBlockManagerId(driver, w7, 62373, None)
20/12/03 23:14:01 INFO BlockManager: Initialized BlockManager:BlockManagerId(driver, w7, 62373, None)
D:\temp\spark\python\lib\pyspark.zip\pyspark\context.py:225:DeprecationWarning: Support for Python 2 and Python 3 prior to version3.6 is deprecated as of Spark 3.0. See also the plan for droppingPython 2 support athttps://spark.apache.org/news/plan-for-dropping-python-2-support.html<https://spark.apache.org/news/plan-for-dropping-python-2-support.html>.
DeprecationWarning)
*20/12/03 23:14:02 INFO SharedState: loading hive config file:file:/D:/temp/spark/conf/hive-site.xml*
*20/12/03 23:14:02 INFO SharedState: spark.sql.warehouse.dir is notset, but hive.metastore.warehouse.dir is set. Settingspark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir('C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse').*
*20/12/03 23:14:02 INFO SharedState: Warehouse path is'C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse'.*
*20/12/03 23:14:04 INFO HiveConf: Found configuration filefile:/D:/temp/spark/conf/hive-site.xml*
*20/12/03 23:14:04 INFO HiveUtils: InitializingHiveMetastoreConnection version 2.3.7 using Spark classes.*
*Traceback (most recent call last):*
* File "C:/Users/admin/PycharmProjects/pythonProject/main.py", line79, in <module>*
*spark.sql("CREATE DATABASE IF NOT EXISTS test")*
*File "D:\temp\spark\python\lib\pyspark.zip\pyspark\sql\session.py",line 649, in sql*
* File"D:\temp\spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",line 1305, in __call__*
* File "D:\temp\spark\python\lib\pyspark.zip\pyspark\sql\utils.py",line 134, in deco*
*  File "<string>", line 3, in raise_from*
*pyspark.sql.utils.AnalysisException: java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V;*
20/12/03 23:14:04 INFO SparkContext: Invoking stop() from shutdown hook
20/12/03 23:14:04 INFO SparkUI: Stopped Spark web UI at http://w7:4040<http://w7:4040>
20/12/03 23:14:04 INFO MapOutputTrackerMasterEndpoint:MapOutputTrackerMasterEndpoint stopped!
20/12/03 23:14:04 INFO MemoryStore: MemoryStore cleared

20/12/03 23:14:04 INFO BlockManager: BlockManager stopped

20/12/03 23:14:04 INFO BlockManagerMaster: BlockManagerMaster stopped
20/12/03 23:14:04 INFOOutputCommitCoordinator$OutputCommitCoordinatorEndpoint:OutputCommitCoordinator stopped!
20/12/03 23:14:04 INFO SparkContext: Successfully stopped SparkContext

20/12/03 23:14:04 INFO ShutdownHookManager: Shutdown hook called
20/12/03 23:14:04 INFO ShutdownHookManager: Deleting directoryC:\Users\admin\AppData\Local\Temp\spark-2ccc7f91-3970-42e4-b564-6621215dd446
20/12/03 23:14:04 INFO ShutdownHookManager: Deleting directoryC:\Users\admin\AppData\Local\Temp\spark-8015fc12-eff7-4d2e-b4c3-f864bf4b00ce\pyspark-12b6b74c-09a3-447f-be8b-b5aa26fa274d
20/12/03 23:14:04 INFO ShutdownHookManager: Deleting directoryC:\Users\admin\AppData\Local\Temp\spark-8015fc12-eff7-4d2e-b4c3-f864bf4b00ce
So basically it finds hive-site.xml under %SPARK_HOME%/conf directory.Tries to initialise HiveMetastoreConnection but fails with error
pyspark.sql.utils.AnalysisException: java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V;
winutils.exe is put under %SPARK_HOME%/bin directory


where winutils.exe

D:\temp\spark\bin\winutils.exe


and permissions chmod -R 777 is set


Also this is hive-site.xml


<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>


<property>

<name>hive.exec.local.scratchdir</name>

<value>C:\Users\admin\PycharmProjects\pythonProject\hive-localscratchdir</value>

<description>Local scratch space for Hive jobs</description>

</property>


 <property>

<name>hive.exec.scratchdir</name>

<value>C:\Users\admin\PycharmProjects\pythonProject\hive-scratchdir</value>
<description>HDFS root scratch dir for Hive jobs which gets createdwith write all (733) permission. For each connecting user, an HDFSscratch dir: ${hive.exec.scratchdir}/<username> is created, with${hive.scratch.dir.permission}.</description>
</property>


<property>

<name>hive.metastore.warehouse.dir</name>

<value>C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse</value>

<description>location of default database for the warehouse</description>

</property>

<property>

<name>spark.sql.warehouse.dir</name>

<value>C:\Users\admin\PycharmProjects\pythonProject\spark-warehouse</value>

<description>location of default database for the warehouse</description>

  </property>


  <property>

<name>hadoop.tmp.dir</name>

<value>d:\temp\hive\</value>

    <description>A base for other temporary directories.</description>

  </property>


  <property>

 <name>javax.jdo.option.ConnectionURL</name>

 
<value>jdbc:derby:C:\Users\admin\PycharmProjects\pythonProject\metastore_db;create=true</value>

   <description>JDBC connect string for a JDBC metastore</description>

  </property>


<property>

 <name>javax.jdo.option.ConnectionDriverName</name>

 <value>org.apache.derby.EmbeddedDriver</value>

 <description>Driver class name for a JDBC metastore</description>

</property>


</configuration>
LinkedIn/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>/
*Disclaimer:* Use it at your own risk.Any and all responsibility forany loss, damage or destruction of data or any other property whichmay arise from relying on this email's technical content is explicitlydisclaimed. The author will in no case be liable for any monetarydamages arising from such loss, damage or destruction.
On Wed, 2 Dec 2020 at 23:11, Artemis User <arte...@dtechspace.com<mailto:arte...@dtechspace.com>> wrote:
    Apparently this is a OS dynamic lib link error.  Make sure you
    have the LD_LIBRARY_PATH (in Linux) or PATH (windows) set up
    properly for the right .so or .dll file...

    On 12/2/20 5:31 PM, Mich Talebzadeh wrote:
    Hi,

    I have a simple code that tries to create Hive derby database as
    follows:

    from pysparkimport SparkContext
    from pyspark.sqlimport SQLContext
    from pyspark.sqlimport HiveContext
    from pyspark.sqlimport SparkSession
    from pyspark.sqlimport Row
    from pyspark.sql.typesimport StringType, ArrayType
    from pyspark.sql.functionsimport udf, col, maxas max, to_date, date_add, \
         add_months
    from datetimeimport datetime, timedelta
    import os
    from os.pathimport join, abspath
    from typingimport Optional
    import logging
    import random
    import string
    import math
    warehouseLocation 
='c:\\Users\\admin\\PycharmProjects\\pythonProject\\spark-warehouse' 
local_scrtatchdir 
='c:\\Users\\admin\\PycharmProjects\\pythonProject\\hive-localscratchdir'
    scrtatchdir 
='c:\\Users\\admin\\PycharmProjects\\pythonProject\\hive-scratchdir' tmp_dir 
='d:\\temp\\hive' metastore_db 
='jdbc:derby:C:\\Users\\admin\\PycharmProjects\\pythonProject\\metastore_db;create=true'
    ConnectionDriverName ='org.apache.derby.EmbeddedDriver' spark = 
SparkSession \
         .builder \
         .appName("App1") \
         .config("hive.exec.local.scratchdir", local_scrtatchdir) \
         .config("hive.exec.scratchdir", scrtatchdir) \
         .config("spark.sql.warehouse.dir", warehouseLocation) \
         .config("hadoop.tmp.dir", tmp_dir) \
         .config("javax.jdo.option.ConnectionURL", metastore_db ) \
         .config("javax.jdo.option.ConnectionDriverName", ConnectionDriverName) 
\
         .enableHiveSupport() \
         .getOrCreate()
    print(os.listdir(warehouseLocation))
    print(os.listdir(local_scrtatchdir))
    print(os.listdir(scrtatchdir))
    print(os.listdir(tmp_dir))
    sc = SparkContext.getOrCreate()
    sqlContext = SQLContext(sc)
    HiveContext = HiveContext(sc)
    spark.sql("CREATE DATABASE IF NOT EXISTS test")

    Now this comes back with the following:


    C:\Users\admin\PycharmProjects\pythonProject\venv\Scripts\python.exe
    C:/Users/admin/PycharmProjects/pythonProject/main.py

    Using Spark's default log4j profile:
    org/apache/spark/log4j-defaults.properties

    Setting default log level to "WARN".

    To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
    use setLogLevel(newLevel).

    []

    []

    []

    ['hive-localscratchdir', 'hive-scratchdir', 'hive-warehouse']

    Traceback (most recent call last):

      File "C:/Users/admin/PycharmProjects/pythonProject/main.py",
    line 76, in <module>

    spark.sql("CREATE DATABASE IF NOT EXISTS test")

      File "D:\temp\spark\python\pyspark\sql\session.py", line 649,
    in sql

        return DataFrame(self._jsparkSession.sql(sqlQuery),
    self._wrapped)

      File
    "D:\temp\spark\python\lib\py4j-0.10.9-src.zip\py4j\java_gateway.py",
    line 1305, in __call__

      File "D:\temp\spark\python\pyspark\sql\utils.py", line 134, in deco

    raise_from(converted)

      File "<string>", line 3, in raise_from

    *pyspark.sql.utils.AnalysisException:
    java.lang.UnsatisfiedLinkError:
    
org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V;*


    Process finished with exit code 1


    Also under %SPARK_HOME%/conf I also have hive-site.xml file. It
    is not obvious to me why it is throwing this error?

    Thanks


    LinkedIn
    
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
    
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>/



    *Disclaimer:* Use it at your own risk.Any and all responsibility
    for any loss, damage or destruction of data or any other property
    which may arise from relying on this email's technical content is
    explicitly disclaimed. The author will in no case be liable for
    any monetary damages arising from such loss, damage or destruction.

Re: In windows 10, accessing Hive from PySpark with PyCharm throws error

Reply via email to