This should work

check your path. It should pyspark from

which pyspark
/opt/spark/bin/pyspark

And your installation should contain

cd $SPARK_HOME
/opt/spark> ls
LICENSE  NOTICE  R  README.md  RELEASE  bin  conf  data  examples  jars
kubernetes  licenses  logs  python  sbin  yarn

You should use

from pyspark import SparkConf, SparkContext

And this is your problem

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.4.1
      /_/

Using Python version 3.9.16 (main, Apr 22 2023 14:16:13)
Spark context Web UI available at http://rhes76:4040
Spark context available as 'sc' (master = local[*], app id =
local-1692606989942).
SparkSession available as 'spark'.
>>> import org.apache.spark.SparkContext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
*ModuleNotFoundError: No module named 'org'*

HTH

Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 21 Aug 2023 at 07:12, Kal Stevens <kalgstev...@gmail.com> wrote:

> Are there installation instructions for Spark 3.4.1?
>
> I defined SPARK_HOME as it describes here
>
> https://spark.apache.org/docs/latest/api/python/getting_started/install.html
>
> ls $SPARK_HOME/python/lib
> py4j-0.10.9.7-src.zip  PY4J_LICENSE.txt  pyspark.zip
>
>
> I am getting a class not found error
>     import org.apache.spark.SparkContext
>
> I also unzipped those files just in case but that gives the same error.
>
>
> It sounds like this is because pyspark is not installed, but as far as I
> can tell it is.
> Pyspark is installed in the correct python verison
>
>
> root@namenode:/home/spark/# pip3.10 install pyspark
> Requirement already satisfied: pyspark in
> /usr/local/lib/python3.10/dist-packages (3.4.1)
> Requirement already satisfied: py4j==0.10.9.7 in
> /usr/local/lib/python3.10/dist-packages (from pyspark) (0.10.9.7)
>
>
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 3.4.1
>       /_/
>
> Using Python version 3.10.12 (main, Jun 11 2023 05:26:28)
> Spark context Web UI available at http://namenode:4040
> Spark context available as 'sc' (master = yarn, app id =
> application_1692452853354_0008).
> SparkSession available as 'spark'.
> Traceback (most recent call last):
>   File "/home/spark/real-estate/pullhttp/pull_apartments.py", line 11, in
> <module>
>     import org.apache.spark.SparkContext
> ModuleNotFoundError: No module named 'org.apache.spark.SparkContext'
> 2023-08-20T19:45:19,242 INFO  [Thread-5] spark.SparkContext: SparkContext
> is stopping with exitCode 0.
> 2023-08-20T19:45:19,246 INFO  [Thread-5] server.AbstractConnector: Stopped
> Spark@467be156{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
> 2023-08-20T19:45:19,247 INFO  [Thread-5] ui.SparkUI: Stopped Spark web UI
> at http://namenode:4040
> 2023-08-20T19:45:19,251 INFO  [YARN application state monitor]
> cluster.YarnClientSchedulerBackend: Interrupting monitor thread
> 2023-08-20T19:45:19,260 INFO  [Thread-5]
> cluster.YarnClientSchedulerBackend: Shutting down all executors
> 2023-08-20T19:45:19,260 INFO  [dispatcher-CoarseGrainedScheduler]
> cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to
> shut down
> 2023-08-20T19:45:19,263 INFO  [Thread-5]
> cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
> 2023-08-20T19:45:19,267 INFO  [dispatcher-event-loop-29]
> spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint
> stopped!
> 2023-08-20T19:45:19,271 INFO  [Thread-5] memory.MemoryStore: MemoryStore
> cleared
> 2023-08-20T19:45:19,271 INFO  [Thread-5] storage.BlockManager:
> BlockManager stopped
> 2023-08-20T19:45:19,275 INFO  [Thread-5] storage.BlockManagerMaster:
> BlockManagerMaster stopped
> 2023-08-20T19:45:19,276 INFO  [dispatcher-event-loop-8]
> scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 2023-08-20T19:45:19,279 INFO  [Thread-5] spark.SparkContext: Successfully
> stopped SparkContext
> 2023-08-20T19:45:19,687 INFO  [shutdown-hook-0] util.ShutdownHookManager:
> Shutdown hook called
> 2023-08-20T19:45:19,688 INFO  [shutdown-hook-0] util.ShutdownHookManager:
> Deleting directory
> /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034/pyspark-2fcfbc8e-fd40-41f5-bf8d-e4c460332895
> 2023-08-20T19:45:19,689 INFO  [shutdown-hook-0] util.ShutdownHookManager:
> Deleting directory /tmp/spark-bf6cbc46-ad8b-429a-9d7a-7d98b7d7912e
> 2023-08-20T19:45:19,690 INFO  [shutdown-hook-0] util.ShutdownHookManager:
> Deleting directory /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034
> 2023-08-20T19:45:19,691 INFO  [shutdown-hook-0] util.ShutdownHookManager:
> Deleting directory /tmp/localPyFiles-6c113b2b-9ac3-45e3-9032-d1c83419aa64
>
>

Reply via email to