Hi Jules, folks,
I have tried “hdfs://<HDFS filepath>” as well as “file://<local Linux
filepath><file:///\\%3clocal%20Linux%20filepath%3e>”. And several variants.
Every time, I get the same msg – NoClassDefFoundError. See below. Why do I get
such a msg, if the problem is simply that Spark cannot find the text file?
Doesn’t the error msg indicate some other source of the problem?
I may be missing something in the error report; I am a Java person, not a
Python programmer. But doesn’t it look like a call to a Java class –something
associated with “o9.textFile” - is failing? If so, how do I fix this?
Ron
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py",
line 451, in textFile
return RDD(self._jsc.textFile(name, minPartitions), self,
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py",
line 36, in deco
return f(*a, **kw)
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile.
: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.rdd.RDDOperationScope$
Ronald C. Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568, email: [email protected]
web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048
From: Jules Damji [mailto:[email protected]]
Sent: Sunday, February 28, 2016 10:07 PM
To: Taylor, Ronald C
Cc: [email protected]; [email protected]
Subject: Re: a basic question on first use of PySpark shell and example, which
is failing
Hello Ronald,
Since you have placed the file under HDFS, you might same change the path name
to:
val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java")
Sent from my iPhone
Pardon the dumb thumb typos :)
On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C
<[email protected]<mailto:[email protected]>> wrote:
Hello folks,
I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at
our lab. I am trying to use the PySpark shell for the first time. and am
attempting to duplicate the documentation example of creating an RDD which I
called "lines" using a text file.
I placed a a text file called Warehouse.java in this HDFS location:
[rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark
-rw-r--r-- 3 rtaylor supergroup 1155355 2016-02-28 18:09
/user/rtaylor/Spark/Warehouse.java
[rtaylor@bigdatann ~]$
I then invoked sc.textFile()in the PySpark shell.That did not work. See below.
Apparently a class is not found? Don't know why that would be the case. Any
guidance would be very much appreciated.
The Cloudera Manager for the cluster says that Spark is operating in the
"green", for whatever that is worth.
- Ron Taylor
>>> lines =
>>> sc.textFile("file:///user/taylor/Spark/Warehouse.java<file:///\\user\taylor\Spark\Warehouse.java>")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py",
line 451, in textFile
return RDD(self._jsc.textFile(name, minPartitions), self,
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py",
line 36, in deco
return f(*a, **kw)
File
"/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile.
: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:825)
at
org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
>>>