Re: JdbcRDD, ClassCastException with scala.Function0

Cody Koeninger Wed, 18 Feb 2015 10:57:39 -0800

Is sc there a SparkContext or a JavaSparkContext?  The compilation error
seems to indicate the former, but JdbcRDD.create expects the latter


On Wed, Feb 18, 2015 at 12:30 PM, Dmitry Goldenberg <
dgoldenberg...@gmail.com> wrote:

> I have tried that as well, I get a compile error --
>
> [ERROR] ...SparkProto.java:[105,39] error: no suitable method found for
> create(SparkContext,<anonymous
> ConnectionFactory>,String,int,int,int,<anonymous
> Function<ResultSet,Integer>>)
>
> The code is a copy and paste:
>
>     JavaRDD<Integer> jdbcRDD = JdbcRDD.create(
>           sc,
>           new JdbcRDD.ConnectionFactory() {
>             public Connection getConnection() throws SQLException {
>               return
> DriverManager.getConnection("jdbc:derby:target/JavaJdbcRDDSuiteDb");
>             }
>           },
>           "SELECT DATA FROM FOO WHERE ? <= ID AND ID <= ?",
>           1, 100, 1,
>           new Function<ResultSet, Integer>() {
>             public Integer call(ResultSet r) throws Exception {
>               return r.getInt(1);
>             }
>           }
>         );
>
> The other thing I've tried was to define a static class locally for
> GetConnection and use the JdbcCreate constructor. This got around the
> compile issues but blew up at runtime with "NoClassDefFoundError:
> scala/runtime/AbstractFunction0" !
>
> JdbcRDD<Row> jdbcRDD = new JdbcRDD<Row>(
> sc,
> (AbstractFunction0<Connection>) new DbConn(), // had to cast or a compile
> error
> SQL_QUERY,
> 0L,
> 1000L,
> 10,
> new MapRow(),
> ROW_CLASS_TAG);
> // DbConn is defined as public static class DbConn extends
> AbstractFunction0<Connection> implements Serializable
>
> On Wed, Feb 18, 2015 at 1:20 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> That test I linked
>>
>>
>> https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java#L90
>>
>> is calling a static method JdbcRDD.create, not new JdbcRDD.  Is that what
>> you tried doing?
>>
>> On Wed, Feb 18, 2015 at 12:00 PM, Dmitry Goldenberg <
>> dgoldenberg...@gmail.com> wrote:
>>
>>> Thanks, Cody. Yes, I originally started off by looking at that but I get
>>> a compile error if I try and use that approach: constructor JdbcRDD in
>>> class JdbcRDD<T> cannot be applied to given types.  Not to mention that
>>> JavaJdbcRDDSuite somehow manages to not pass in the class tag (the last
>>> argument).
>>>
>>> Wonder if it's a JDK version issue, I'm using 1.7.
>>>
>>> So I've got this, which doesn't compile
>>>
>>> JdbcRDD<Row> jdbcRDD = new JdbcRDD<Row>(
>>> new SparkContext(conf),
>>> new JdbcRDD.ConnectionFactory() {
>>> public Connection getConnection() throws SQLException {
>>> Connection conn = null;
>>> try {
>>> Class.forName(JDBC_DRIVER);
>>> conn = DriverManager.getConnection(JDBC_URL, JDBC_USER, JDBC_PASSWORD);
>>> } catch (ClassNotFoundException ex) {
>>> throw new RuntimeException("Error while loading JDBC driver.", ex);
>>> }
>>> return conn;
>>> }
>>> },
>>> "SELECT * FROM EMPLOYEES",
>>> 0L,
>>> 1000L,
>>> 10,
>>> new Function<ResultSet, Row>() {
>>> public Row call(ResultSet r) throws Exception {
>>> return null; // have some actual logic here...
>>> }
>>> },
>>> scala.reflect.ClassManifestFactory$.MODULE$.fromClass(Row.class));
>>>
>>> The other approach was mimicing the DbConnection class from this post:
>>> http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/.
>>> It got around any of the compilation issues but then I got the runtime
>>> error where Spark wouldn't recognize the db connection class as a
>>> scala.Function0.
>>>
>>>
>>>
>>> On Wed, Feb 18, 2015 at 12:37 PM, Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>>
>>>> Take a look at
>>>>
>>>>
>>>> https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java
>>>>
>>>>
>>>>
>>>> On Wed, Feb 18, 2015 at 11:14 AM, dgoldenberg <dgoldenberg...@gmail.com
>>>> > wrote:
>>>>
>>>>> I'm reading data from a database using JdbcRDD, in Java, and I have an
>>>>> implementation of Function0<Connection> whose instance I supply as the
>>>>> 'getConnection' parameter into the JdbcRDD constructor. Compiles fine.
>>>>>
>>>>> The definition of the class/function is as follows:
>>>>>
>>>>>   public class GetDbConnection extends AbstractFunction0<Connection>
>>>>> implements Serializable
>>>>>
>>>>> where scala.runtime.AbstractFunction0 extends scala.Function0.
>>>>>
>>>>> At runtime, I get an exception as below. Does anyone have an idea as
>>>>> to how
>>>>> to resolve this/work around it? Thanks.
>>>>>
>>>>> I'm running Spark 1.2.1 built for Hadoop 2.4.
>>>>>
>>>>>
>>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>>> aborted due
>>>>> to stage failure: Task 3 in stage 0.0 failed 1 times, most recent
>>>>> failure:
>>>>> Lost task 3.0 in stage 0.0 (TID 3, localhost):
>>>>> java.lang.ClassCastException:
>>>>> cannot assign instance of com.kona.motivis.spark.proto.GetDbConnection
>>>>> to
>>>>> field
>>>>> org.apache.spark.rdd.JdbcRDD.org$apache$spark$rdd$JdbcRDD$$getConnection
>>>>> of
>>>>> type scala.Function0 in instance of org.apache.spark.rdd.JdbcRDD
>>>>>         at
>>>>>
>>>>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>>>>>         at
>>>>>
>>>>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>>>>>         at
>>>>>
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
>>>>>         at
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>>         at
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>>         at
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>>         at
>>>>>
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>>>         at
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>>         at
>>>>>
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>>         at
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>>         at
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>>>         at
>>>>>
>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>>>>>         at
>>>>>
>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
>>>>>         at
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
>>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>>         at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>>>>>         at
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>         at
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>         at java.lang.Thread.run(Thread.java:744)
>>>>>
>>>>> Driver stacktrace:
>>>>>         at
>>>>> org.apache.spark.scheduler.DAGScheduler.org
>>>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
>>>>>         at
>>>>>
>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>         at
>>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>>>>>         at scala.Option.foreach(Option.scala:236)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
>>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>>         at
>>>>>
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
>>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>>>>>         at
>>>>>
>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>>>>>         at
>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>         at
>>>>>
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>         at
>>>>>
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>         at
>>>>>
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-ClassCastException-with-scala-Function0-tp21707.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: JdbcRDD, ClassCastException with scala.Function0

Reply via email to