________________________________
From: Kelum Perera <kelum0...@hotmail.com>
Sent: Thursday, October 12, 2023 11:40 AM
To: user@spark.apache.org <user@spark.apache.org>; Kelum Perera 
<kelum0...@hotmail.com>; Kelum Gmail <kelum0...@gmail.com>
Subject: Can not complete the read csv task

Dear friends,

I'm trying to get a fresh start with Spark. I tried to read few CSV files in a 
folder, but the task got stuck and not completed as shown in the copied content 
from the terminal.

Can someone help to understand what is going wrong?

Versions;
java version "11.0.16" 2022-07-19 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.16+11-LTS-199)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.16+11-LTS-199, mixed mode)

Python 3.9.13
Windows 10

Copied from the terminal;
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
      /_/

Using Python version 3.9.13 (main, Aug 25 2022 23:51:50)
Spark context Web UI available at http://LK510FIDSLW4.ey.net:4041
Spark context available as 'sc' (master = local[*], app id = 
local-1697089858181).
SparkSession available as 'spark'.
>>> merged_spark_data = 
>>> spark.read.csv(r"C:\Users\Kelum.Perera\Downloads\data-master\nyse_all\nyse_data\*",
>>>  header=False )
Exception in thread "globPath-ForkJoinPool-1-worker-115" 
java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
        at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
        at 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
        at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249)
        at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)
        at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128)
        at org.apache.hadoop.fs.Globber.doGlob(Globber.java:291)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:202)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124)
        at 
org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:238)
        at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:737)
        at 
org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:380)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at 
java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
        at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)



Noting happens afterwards. Appreciate your kind input to solve this.

Best Regards,
Kelum Perera



Reply via email to