Honestly, moving to Scala and using case classes is the path of least resistance in the long term.
Thanks, Andy. -- Andy Grove Chief Architect AgilData - Simple Streaming SQL that Scales www.agildata.com On Wed, Jan 20, 2016 at 10:19 AM, Raghu Ganti <raghuki...@gmail.com> wrote: > Thanks for your reply, Andy. > > Yes, that is what I concluded based on the Stack trace. The problem is > stemming from Java implementation of generics, but I thought this will go > away if you compiled against Java 1.8, which solves the issues of proper > generic implementation. > > Any ideas? > > Also, are you saying that in order for my example to work, I would need to > move to Scala and have the UDT implemented in Scala? > > > On Wed, Jan 20, 2016 at 10:27 AM, Andy Grove <andy.gr...@agildata.com> > wrote: > >> Catalyst is expecting a class that implements scala.Row or scala.Product >> and is instead finding a Java class. I've run into this issue a number of >> times. Dataframe doesn't work so well with Java. Here's a blog post with >> more information on this: >> >> http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/ >> >> >> Thanks, >> >> Andy. >> >> -- >> >> Andy Grove >> Chief Architect >> AgilData - Simple Streaming SQL that Scales >> www.agildata.com >> >> >> On Wed, Jan 20, 2016 at 7:07 AM, raghukiran <raghuki...@gmail.com> wrote: >> >>> Hi, >>> >>> I created a custom UserDefinedType in Java as follows: >>> >>> SQLPoint = new UserDefinedType<JavaPoint>() { >>> //overriding serialize, deserialize, sqlType, userClass functions here >>> } >>> >>> When creating a dataframe, I am following the manual mapping, I have a >>> constructor for JavaPoint - JavaPoint(double x, double y) and a Customer >>> record as follows: >>> >>> public class CustomerRecord { >>> private int id; >>> private String name; >>> private Object location; >>> >>> //setters and getters follow here >>> } >>> >>> Following the example in Spark source, when I create a RDD as follows: >>> >>> sc.textFile(inputFileName).map(new Function<String, CustomerRecord>() { >>> //call method >>> CustomerRecord rec = new CustomerRecord(); >>> rec.setLocation(SQLPoint.serialize(new JavaPoint(x, y))); >>> }); >>> >>> This results in a MatchError. The stack trace is as follows: >>> >>> scala.MatchError: [B@45aa3dd5 (of class [B) >>> at >>> >>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255) >>> at >>> >>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250) >>> at >>> >>> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) >>> at >>> >>> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) >>> at >>> >>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358) >>> at >>> >>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358) >>> at >>> >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>> at >>> >>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>> at >>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >>> at >>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >>> at >>> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) >>> at >>> >>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1358) >>> at >>> >>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1356) >>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) >>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>> at >>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >>> at >>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >>> at >>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >>> at scala.collection.TraversableOnce$class.to >>> (TraversableOnce.scala:273) >>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >>> at >>> >>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >>> at >>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >>> at >>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >>> at >>> >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212) >>> at >>> >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212) >>> at >>> >>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) >>> at >>> >>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>> at org.apache.spark.scheduler.Task.run(Task.scala:89) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-MatchError-in-Spark-SQL-tp26021.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >