Will do. Thanks a lot.
On Mon, Feb 16, 2015 at 7:20 PM, Davies Liu <dav...@databricks.com> wrote: > Can you try the example in pyspark-cassandra? > > If not, you could create a issue there. > > On Mon, Feb 16, 2015 at 4:07 PM, Mohamed Lrhazi > <mohamed.lrh...@georgetown.edu> wrote: > > So I tired building the connector from: > > https://github.com/datastax/spark-cassandra-connector > > > > which seems to include the java class referenced in the error message: > > > > [root@devzero spark]# unzip -l > > > spark-cassandra-connector/spark-cassandra-connector-java/target/scala-2.10/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar > > |grep CassandraJavaUtil > > > > 14612 02-16-2015 23:25 > > com/datastax/spark/connector/japi/CassandraJavaUtil.class > > > > [root@devzero spark]# > > > > > > When I try running my spark test job, I still get the exact same error, > even > > though both my jars seems to have been processed by spark. > > > > > > ... > > 15/02/17 00:00:45 INFO SparkUI: Started SparkUI at http://devzero:4040 > > 15/02/17 00:00:45 INFO SparkContext: Added JAR > > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at > > http://10.212.55.42:36929/jars/pyspark-cassandra-0.1-SNAPSHOT.jar with > > timestamp 1424131245595 > > 15/02/17 00:00:45 INFO SparkContext: Added JAR > > file:/spark/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar at > > > http://10.212.55.42:36929/jars/spark-cassandra-connector-java-assembly-1.2.0-SNAPSHOT.jar > > with timestamp 1424131245623 > > 15/02/17 00:00:45 INFO Utils: Copying /spark/test2.py to > > > /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/test2.py > > 15/02/17 00:00:45 INFO SparkContext: Added file file:/spark/test2.py at > > http://10.212.55.42:36929/files/test2.py with timestamp 1424131245624 > > 15/02/17 00:00:45 INFO Utils: Copying /spark/pyspark_cassandra.py to > > > /tmp/spark-8588b528-d016-42ac-aa7c-e8cf07c1b659/spark-ae3141dd-ae6c-4e99-b7c8-f97ccb3fd8e5/pyspark_cassandra.py > > 15/02/17 00:00:45 INFO SparkContext: Added file > > file:/spark/pyspark_cassandra.py at > > http://10.212.55.42:36929/files/pyspark_cassandra.py with timestamp > > 1424131245633 > > 15/02/17 00:00:45 INFO Executor: Starting executor ID <driver> on host > > localhost > > 15/ > > .... > > 15/02/17 00:00:47 INFO RemoteActorRefProvider$RemotingTerminator: > Remoting > > shut down. > > Traceback (most recent call last): > > File "/spark/test2.py", line 5, in <module> > > sc = CassandraSparkContext(conf=conf) > > File "/spark/python/pyspark/context.py", line 105, in __init__ > > conf, jsc) > > File "/spark/pyspark_cassandra.py", line 17, in _do_init > > self._jcsc = self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) > > File "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line > > 726, in __getattr__ > > py4j.protocol.Py4JError: Trying to call a package. > > > > > > am I building the wrong connector jar? or using the wrong jar? > > > > Thanks a lot, > > Mohamed. > > > > > > > > On Mon, Feb 16, 2015 at 5:46 PM, Mohamed Lrhazi > > <mohamed.lrh...@georgetown.edu> wrote: > >> > >> Oh, I don't know. thanks a lot Davies, gonna figure that out now.... > >> > >> On Mon, Feb 16, 2015 at 5:31 PM, Davies Liu <dav...@databricks.com> > wrote: > >>> > >>> It also need the Cassandra jar: > >>> com.datastax.spark.connector.CassandraJavaUtil > >>> > >>> Is it included in /spark/pyspark-cassandra-0.1-SNAPSHOT.jar ? > >>> > >>> > >>> > >>> On Mon, Feb 16, 2015 at 1:20 PM, Mohamed Lrhazi > >>> <mohamed.lrh...@georgetown.edu> wrote: > >>> > Yes, am sure the system cant find the jar.. but how do I fix that... > my > >>> > submit command includes the jar: > >>> > > >>> > /spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py --jars > >>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path > >>> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py > >>> > > >>> > and the spark output seems to indicate it is handling it: > >>> > > >>> > 15/02/16 05:58:46 INFO SparkContext: Added JAR > >>> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at > >>> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar > with > >>> > timestamp 1424066326632 > >>> > > >>> > > >>> > I don't really know what else I could try.... any suggestions highly > >>> > appreciated. > >>> > > >>> > Thanks, > >>> > Mohamed. > >>> > > >>> > > >>> > On Mon, Feb 16, 2015 at 4:04 PM, Davies Liu <dav...@databricks.com> > >>> > wrote: > >>> >> > >>> >> It seems that the jar for cassandra is not loaded, you should have > >>> >> them in the classpath. > >>> >> > >>> >> On Mon, Feb 16, 2015 at 12:08 PM, Mohamed Lrhazi > >>> >> <mohamed.lrh...@georgetown.edu> wrote: > >>> >> > Hello all, > >>> >> > > >>> >> > Trying the example code from this package > >>> >> > (https://github.com/Parsely/pyspark-cassandra) , I always get > this > >>> >> > error... > >>> >> > > >>> >> > Can you see what I am doing wrong? from googling arounf it seems > to > >>> >> > be > >>> >> > that > >>> >> > the jar is not found somehow... The spark log shows the JAR was > >>> >> > processed > >>> >> > at least. > >>> >> > > >>> >> > Thank you so much. > >>> >> > > >>> >> > am using spark-1.2.1-bin-hadoop2.4.tgz > >>> >> > > >>> >> > test2.py is simply: > >>> >> > > >>> >> > from pyspark.context import SparkConf > >>> >> > from pyspark_cassandra import CassandraSparkContext, > saveToCassandra > >>> >> > conf = SparkConf().setAppName("PySpark Cassandra Sample Driver") > >>> >> > conf.set("spark.cassandra.connection.host", "devzero") > >>> >> > sc = CassandraSparkContext(conf=conf) > >>> >> > > >>> >> > [root@devzero spark]# /usr/local/bin/docker-enter spark-master > bash > >>> >> > -c > >>> >> > "/spark/bin/spark-submit --py-files /spark/pyspark_cassandra.py > >>> >> > --jars > >>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar --driver-class-path > >>> >> > /spark/pyspark-cassandra-0.1-SNAPSHOT.jar /spark/test2.py" > >>> >> > ... > >>> >> > 15/02/16 05:58:45 INFO Slf4jLogger: Slf4jLogger started > >>> >> > 15/02/16 05:58:45 INFO Remoting: Starting remoting > >>> >> > 15/02/16 05:58:45 INFO Remoting: Remoting started; listening on > >>> >> > addresses > >>> >> > :[akka.tcp://sparkDriver@devzero:38917] > >>> >> > 15/02/16 05:58:45 INFO Utils: Successfully started service > >>> >> > 'sparkDriver' > >>> >> > on > >>> >> > port 38917. > >>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering MapOutputTracker > >>> >> > 15/02/16 05:58:45 INFO SparkEnv: Registering BlockManagerMaster > >>> >> > 15/02/16 05:58:45 INFO DiskBlockManager: Created local directory > at > >>> >> > > >>> >> > > >>> >> > > /tmp/spark-6cdca68b-edec-4a31-b3c1-a7e9d60191e7/spark-0e977468-6e31-4bba-959a-135d9ebda193 > >>> >> > 15/02/16 05:58:45 INFO MemoryStore: MemoryStore started with > >>> >> > capacity > >>> >> > 265.4 > >>> >> > MB > >>> >> > 15/02/16 05:58:45 WARN NativeCodeLoader: Unable to load > >>> >> > native-hadoop > >>> >> > library for your platform... using builtin-java classes where > >>> >> > applicable > >>> >> > 15/02/16 05:58:46 INFO HttpFileServer: HTTP File server directory > is > >>> >> > > >>> >> > > >>> >> > > /tmp/spark-af61f7f5-7c0e-412c-8352-263338335fa5/spark-10b3891f-0321-44fe-ba60-1a8c102fd647 > >>> >> > 15/02/16 05:58:46 INFO HttpServer: Starting HTTP Server > >>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service 'HTTP > >>> >> > file > >>> >> > server' on port 56642. > >>> >> > 15/02/16 05:58:46 INFO Utils: Successfully started service > 'SparkUI' > >>> >> > on > >>> >> > port > >>> >> > 4040. > >>> >> > 15/02/16 05:58:46 INFO SparkUI: Started SparkUI at > >>> >> > http://devzero:4040 > >>> >> > 15/02/16 05:58:46 INFO SparkContext: Added JAR > >>> >> > file:/spark/pyspark-cassandra-0.1-SNAPSHOT.jar at > >>> >> > http://10.212.55.42:56642/jars/pyspark-cassandra-0.1-SNAPSHOT.jar > >>> >> > with > >>> >> > timestamp 1424066326632 > >>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/test2.py to > >>> >> > > >>> >> > > >>> >> > > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/test2.py > >>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file > file:/spark/test2.py > >>> >> > at > >>> >> > http://10.212.55.42:56642/files/test2.py with timestamp > >>> >> > 1424066326633 > >>> >> > 15/02/16 05:58:46 INFO Utils: Copying /spark/pyspark_cassandra.py > to > >>> >> > > >>> >> > > >>> >> > > /tmp/spark-e8cc013e-faae-4208-8bcd-0bb6c00b1b6c/spark-54f2c41d-ae35-4efd-860c-2e5c60979b4c/pyspark_cassandra.py > >>> >> > 15/02/16 05:58:46 INFO SparkContext: Added file > >>> >> > file:/spark/pyspark_cassandra.py at > >>> >> > http://10.212.55.42:56642/files/pyspark_cassandra.py with > timestamp > >>> >> > 1424066326642 > >>> >> > 15/02/16 05:58:46 INFO Executor: Starting executor ID <driver> on > >>> >> > host > >>> >> > localhost > >>> >> > 15/02/16 05:58:46 INFO AkkaUtils: Connecting to HeartbeatReceiver: > >>> >> > akka.tcp://sparkDriver@devzero:38917/user/HeartbeatReceiver > >>> >> > 15/02/16 05:58:46 INFO NettyBlockTransferService: Server created > on > >>> >> > 32895 > >>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Trying to register > >>> >> > BlockManager > >>> >> > 15/02/16 05:58:46 INFO BlockManagerMasterActor: Registering block > >>> >> > manager > >>> >> > localhost:32895 with 265.4 MB RAM, BlockManagerId(<driver>, > >>> >> > localhost, > >>> >> > 32895) > >>> >> > 15/02/16 05:58:46 INFO BlockManagerMaster: Registered BlockManager > >>> >> > 15/02/16 05:58:47 INFO SparkUI: Stopped Spark web UI at > >>> >> > http://devzero:4040 > >>> >> > 15/02/16 05:58:47 INFO DAGScheduler: Stopping DAGScheduler > >>> >> > 15/02/16 05:58:48 INFO MapOutputTrackerMasterActor: > >>> >> > MapOutputTrackerActor > >>> >> > stopped! > >>> >> > 15/02/16 05:58:48 INFO MemoryStore: MemoryStore cleared > >>> >> > 15/02/16 05:58:48 INFO BlockManager: BlockManager stopped > >>> >> > 15/02/16 05:58:48 INFO BlockManagerMaster: BlockManagerMaster > >>> >> > stopped > >>> >> > 15/02/16 05:58:48 INFO SparkContext: Successfully stopped > >>> >> > SparkContext > >>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: > >>> >> > Shutting > >>> >> > down remote daemon. > >>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: > >>> >> > Remote > >>> >> > daemon shut down; proceeding with flushing remote transports. > >>> >> > 15/02/16 05:58:48 INFO RemoteActorRefProvider$RemotingTerminator: > >>> >> > Remoting > >>> >> > shut down. > >>> >> > Traceback (most recent call last): > >>> >> > File "/spark/test2.py", line 5, in <module> > >>> >> > sc = CassandraSparkContext(conf=conf) > >>> >> > File "/spark/python/pyspark/context.py", line 105, in __init__ > >>> >> > conf, jsc) > >>> >> > File "/spark/pyspark_cassandra.py", line 17, in _do_init > >>> >> > self._jcsc = > >>> >> > self._jvm.CassandraJavaUtil.javaFunctions(self._jsc) > >>> >> > File > >>> >> > "/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > >>> >> > line > >>> >> > 726, in __getattr__ > >>> >> > py4j.protocol.Py4JError: Trying to call a package. > >>> >> > > >>> >> > > >>> > > >>> > > >> > >> > > >