Sukesh Pabolu created ARROW-12399: ------------------------------------- Summary: Unable to load libhdfs Key: ARROW-12399 URL: https://issues.apache.org/jira/browse/ARROW-12399 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 3.0.0 Reporter: Sukesh Pabolu Fix For: 3.0.0
I am using pyarrow 3.0.0 with python 3.7. Facing this following error. I am using pyspark 3.1.1. I am not able to save dataframe to hdfs. When I used pyspark 3.0.0 I was able to save dataframe hdfs. *please help:* *import pyarrow as pa* *fs = pa.hdfs.connect(host='localhost', port=9001)* __main__:1: DeprecationWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 219, in connect extra_conf=extra_conf File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 229, in _connect extra_conf=extra_conf) File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 45, in __init__ self._connect(host, port, user, kerb_ticket, extra_conf) File "pyarrow\io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status OSError: Unable to load libhdfs: The specified module could not be found. -- This message was sent by Atlassian Jira (v8.3.4#803005)