It also need the Cassandra jar: com.datastax.spark.connector.CassandraJavaUtil
Is it included in /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ? On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi <mohamed.lrh...@georgetown.edu> wrote: > Yes, am sure the system cant find the jar.. but how do I fix that... my > submit command includes the jar: > > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py > > and the spark output seems to indicate it is handling it: > > 15/02/16 05:58:46 INFO SparkContext: Added JAR > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with > timestamp 1424066326632 > > > I don't really know what else I could try.... any suggestions highly > appreciated. > > Thanks, > Mohamed. > > > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <dav...@databricks.com> wrote: >> >> It seems that the jar for cassandra is not loaded, you should have >> them in the classpath. >> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi >> <mohamed.lrh...@georgetown.edu> wrote: >> > Hello all, >> > >> > Trying the example code from this package >> > (https://github.com/Parsely/pyspark-cassandra) , I always get this >> > error... >> > >> > Can you see what I am doing wrong? from googling arounf it seems to be >> > that >> > the jar is not found somehow... The spark log shows the JAR was >> > processed >> > at least. >> > >> > Thank you so much. >> > >> > am using spark-1.2.1-bin-hadoop2.4.tgz >> > >> > test2.py is simply: >> > >> > from pyspark.context import SparkConf >> > from pyspark_cassandra import CassandraSparkContext, saveToCassandra >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver") >> > conf.set("spark.cassandra.connection.host", "devzero") >> > sc = CassandraSparkContext(conf=conf) >> > >> > [root@devzero spark]# /usr/local/bin/docker-enter spark-master bash -c >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py" >> > ... >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on >> > addresses >> > :[akka.tcp://sparkDriver@devzero:38917] >> > 15/02/16 05:58:45 INFO Utils: Successfully started service 'sparkDriver' >> > on >> > port 38917. >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory at >> > >> > /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193 >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with capacity >> > 265.4 >> > MB >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load native-hadoop >> > library for your platform... using builtin-java classes where applicable >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory is >> > >> > /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647 >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP file >> > server' on port 56642. >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'SparkUI' on >> > port >> > 4040. >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at http://devzero:4040 >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with >> > timestamp 1424066326632 >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to >> > >> > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py >> > 15/02/16 05:58:46 INFO SparkContext: Added file file:/spark/test2.py at >> > http://10.212.55.42:56642/files/test2.py with timestamp 1424066326633 >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py to >> > >> > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py >> > 15/02/16 05:58:46 INFO SparkContext: Added file >> > file:/spark/pyspark_cassandra.py at >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with timestamp >> > 1424066326642 >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver> on host >> > localhost >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver: >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created on >> > 32895 >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register >> > BlockManager >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block >> > manager >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, >> > 32895) >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at >> > http://devzero:4040 >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor: >> > MapOutputTrackerActor >> > stopped! >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster stopped >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped SparkContext >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >> > Shutting >> > down remote daemon. >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: Remote >> > daemon shut down; proceeding with flushing remote transports. >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >> > Remoting >> > shut down. >> > Traceback (most recent call last): >> > File "/spark/test2.py", line 5, in <module> >> > sc = CassandraSparkContext(conf=conf) >> > File "/spark/python/pyspark/context.py", line 105, in __init__ >> > conf, jsc) >> > File "/spark/pyspark_cassandra.py", line 17, in _do_init >> > self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) >> > File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", >> > line >> > 726, in __getattr__ >> > py4j.protocol.Py4JError: Trying to call a package. >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org