Can you try the example in pyspark-cassandra? If not, you could create a issue there.
On Mon, Feb 16, 2015 at 4:07 PM, Mohamed Lrhazi <mohamed.lrh...@georgetown.edu> wrote: > So I tired building the connector from: > https://github.com/datastax/spark-cassandra-connector > > which seems to include the java class referenced in the error message: > > [root@devzero spark]# unzip -l > spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar > |grep CassandraJavaUtil > > 14612 02-16-2015 23:25 > com/datastax/spark/connector/japi/CassandraJavaUtil.class > > [root@devzero spark]# > > > When I try running my spark test job, I still get the exact same error, even > though both my jars seems to have been processed by spark. > > > ... > 15/02/17 00:00:45 INFO SparkUI: Started SparkUI at http://devzero:4040 > 15/02/17 00:00:45 INFO SparkContext: Added JAR > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at > http://10.212.55.42:36929/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with > timestamp 1424131245595 > 15/02/17 00:00:45 INFO SparkContext: Added JAR > file:/spark/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar at > http://10.212.55.42:36929/jars/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar > with timestamp 1424131245623 > 15/02/17 00:00:45 INFO Utils: Copying /spark/test2.py to > /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/test2.py > 15/02/17 00:00:45 INFO SparkContext: Added file file:/spark/test2.py at > http://10.212.55.42:36929/files/test2.py with timestamp 1424131245624 > 15/02/17 00:00:45 INFO Utils: Copying /spark/pyspark_cassandra.py to > /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/pyspark_cassandra.py > 15/02/17 00:00:45 INFO SparkContext: Added file > file:/spark/pyspark_cassandra.py at > http://10.212.55.42:36929/files/pyspark_cassandra.py with timestamp > 1424131245633 > 15/02/17 00:00:45 INFO Executor: Starting executor ID <driver> on host > localhost > 15/ > .... > 15/02/17 00:00:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting > shut down. > Traceback (most recent call last): > File "/spark/test2.py", line 5, in <module> > sc = CassandraSparkContext(conf=conf) > File "/spark/python/pyspark/context.py", line 105, in __init__ > conf, jsc) > File "/spark/pyspark_cassandra.py", line 17, in _do_init > self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) > File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line > 726, in __getattr__ > py4j.protocol.Py4JError: Trying to call a package. > > > am I building the wrong connector jar? or using the wrong jar? > > Thanks a lot, > Mohamed. > > > > On Mon, Feb 16, 2015 at 5:46 PM, Mohamed Lrhazi > <mohamed.lrh...@georgetown.edu> wrote: >> >> Oh, I don't know. thanks a lot Davies, gonna figure that out now.... >> >> On Mon, Feb 16, 2015 at 5:31 PM, Davies Liu <dav...@databricks.com> wrote: >>> >>> It also need the Cassandra jar: >>> com.datastax.spark.connector.CassandraJavaUtil >>> >>> Is it included in /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ? >>> >>> >>> >>> On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi >>> <mohamed.lrh...@georgetown.edu> wrote: >>> > Yes, am sure the system cant find the jar.. but how do I fix that... my >>> > submit command includes the jar: >>> > >>> > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars >>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path >>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py >>> > >>> > and the spark output seems to indicate it is handling it: >>> > >>> > 15/02/16 05:58:46 INFO SparkContext: Added JAR >>> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at >>> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with >>> > timestamp 1424066326632 >>> > >>> > >>> > I don't really know what else I could try.... any suggestions highly >>> > appreciated. >>> > >>> > Thanks, >>> > Mohamed. >>> > >>> > >>> > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <dav...@databricks.com> >>> > wrote: >>> >> >>> >> It seems that the jar for cassandra is not loaded, you should have >>> >> them in the classpath. >>> >> >>> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi >>> >> <mohamed.lrh...@georgetown.edu> wrote: >>> >> > Hello all, >>> >> > >>> >> > Trying the example code from this package >>> >> > (https://github.com/Parsely/pyspark-cassandra) , I always get this >>> >> > error... >>> >> > >>> >> > Can you see what I am doing wrong? from googling arounf it seems to >>> >> > be >>> >> > that >>> >> > the jar is not found somehow... The spark log shows the JAR was >>> >> > processed >>> >> > at least. >>> >> > >>> >> > Thank you so much. >>> >> > >>> >> > am using spark-1.2.1-bin-hadoop2.4.tgz >>> >> > >>> >> > test2.py is simply: >>> >> > >>> >> > from pyspark.context import SparkConf >>> >> > from pyspark_cassandra import CassandraSparkContext, saveToCassandra >>> >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver") >>> >> > conf.set("spark.cassandra.connection.host", "devzero") >>> >> > sc = CassandraSparkContext(conf=conf) >>> >> > >>> >> > [root@devzero spark]# /usr/local/bin/docker-enter spark-master bash >>> >> > -c >>> >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py >>> >> > --jars >>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path >>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py" >>> >> > ... >>> >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started >>> >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting >>> >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on >>> >> > addresses >>> >> > :[akka.tcp://sparkDriver@devzero:38917] >>> >> > 15/02/16 05:58:45 INFO Utils: Successfully started service >>> >> > 'sparkDriver' >>> >> > on >>> >> > port 38917. >>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker >>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster >>> >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory at >>> >> > >>> >> > >>> >> > /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193 >>> >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with >>> >> > capacity >>> >> > 265.4 >>> >> > MB >>> >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load >>> >> > native-hadoop >>> >> > library for your platform... using builtin-java classes where >>> >> > applicable >>> >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory is >>> >> > >>> >> > >>> >> > /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647 >>> >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server >>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP >>> >> > file >>> >> > server' on port 56642. >>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'SparkUI' >>> >> > on >>> >> > port >>> >> > 4040. >>> >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at >>> >> > http://devzero:4040 >>> >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR >>> >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at >>> >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar >>> >> > with >>> >> > timestamp 1424066326632 >>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to >>> >> > >>> >> > >>> >> > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py >>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file file:/spark/test2.py >>> >> > at >>> >> > http://10.212.55.42:56642/files/test2.py with timestamp >>> >> > 1424066326633 >>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py to >>> >> > >>> >> > >>> >> > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py >>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file >>> >> > file:/spark/pyspark_cassandra.py at >>> >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with timestamp >>> >> > 1424066326642 >>> >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver> on >>> >> > host >>> >> > localhost >>> >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver: >>> >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver >>> >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created on >>> >> > 32895 >>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register >>> >> > BlockManager >>> >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block >>> >> > manager >>> >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>, >>> >> > localhost, >>> >> > 32895) >>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager >>> >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at >>> >> > http://devzero:4040 >>> >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler >>> >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor: >>> >> > MapOutputTrackerActor >>> >> > stopped! >>> >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared >>> >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped >>> >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster >>> >> > stopped >>> >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped >>> >> > SparkContext >>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >>> >> > Shutting >>> >> > down remote daemon. >>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >>> >> > Remote >>> >> > daemon shut down; proceeding with flushing remote transports. >>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: >>> >> > Remoting >>> >> > shut down. >>> >> > Traceback (most recent call last): >>> >> > File "/spark/test2.py", line 5, in <module> >>> >> > sc = CassandraSparkContext(conf=conf) >>> >> > File "/spark/python/pyspark/context.py", line 105, in __init__ >>> >> > conf, jsc) >>> >> > File "/spark/pyspark_cassandra.py", line 17, in _do_init >>> >> > self._jcsc = >>> >> > self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) >>> >> > File >>> >> > "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", >>> >> > line >>> >> > 726, in __getattr__ >>> >> > py4j.protocol.Py4JError: Trying to call a package. >>> >> > >>> >> > >>> > >>> > >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org