Ah, OK! I am a novice to Scala - will take a look at Scala case classes. It would be awesome if you can provide some pointers.
Thanks, Raghu On Wed, Jan 20, 2016 at 12:25 PM, Andy Grove <andy.gr...@agildata.com> wrote: > I'm talking about implementing CustomerRecord as a scala case class, > rather than as a Java class. Scala case classes implement the scala.Product > trait, which Catalyst is looking for. > > > Thanks, > > Andy. > > -- > > Andy Grove > Chief Architect > AgilData - Simple Streaming SQL that Scales > www.agildata.com > > > On Wed, Jan 20, 2016 at 10:21 AM, Raghu Ganti <raghuki...@gmail.com> > wrote: > >> Is it not internal to the Catalyst implementation? I should not be >> modifying the Spark source to get things to work, do I? :-) >> >> On Wed, Jan 20, 2016 at 12:21 PM, Raghu Ganti <raghuki...@gmail.com> >> wrote: >> >>> Case classes where? >>> >>> On Wed, Jan 20, 2016 at 12:21 PM, Andy Grove <andy.gr...@agildata.com> >>> wrote: >>> >>>> Honestly, moving to Scala and using case classes is the path of least >>>> resistance in the long term. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Andy. >>>> >>>> -- >>>> >>>> Andy Grove >>>> Chief Architect >>>> AgilData - Simple Streaming SQL that Scales >>>> www.agildata.com >>>> >>>> >>>> On Wed, Jan 20, 2016 at 10:19 AM, Raghu Ganti <raghuki...@gmail.com> >>>> wrote: >>>> >>>>> Thanks for your reply, Andy. >>>>> >>>>> Yes, that is what I concluded based on the Stack trace. The problem is >>>>> stemming from Java implementation of generics, but I thought this will go >>>>> away if you compiled against Java 1.8, which solves the issues of proper >>>>> generic implementation. >>>>> >>>>> Any ideas? >>>>> >>>>> Also, are you saying that in order for my example to work, I would >>>>> need to move to Scala and have the UDT implemented in Scala? >>>>> >>>>> >>>>> On Wed, Jan 20, 2016 at 10:27 AM, Andy Grove <andy.gr...@agildata.com> >>>>> wrote: >>>>> >>>>>> Catalyst is expecting a class that implements scala.Row or >>>>>> scala.Product and is instead finding a Java class. I've run into this >>>>>> issue >>>>>> a number of times. Dataframe doesn't work so well with Java. Here's a >>>>>> blog >>>>>> post with more information on this: >>>>>> >>>>>> http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/ >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Andy. >>>>>> >>>>>> -- >>>>>> >>>>>> Andy Grove >>>>>> Chief Architect >>>>>> AgilData - Simple Streaming SQL that Scales >>>>>> www.agildata.com >>>>>> >>>>>> >>>>>> On Wed, Jan 20, 2016 at 7:07 AM, raghukiran <raghuki...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I created a custom UserDefinedType in Java as follows: >>>>>>> >>>>>>> SQLPoint = new UserDefinedType<JavaPoint>() { >>>>>>> //overriding serialize, deserialize, sqlType, userClass functions >>>>>>> here >>>>>>> } >>>>>>> >>>>>>> When creating a dataframe, I am following the manual mapping, I have >>>>>>> a >>>>>>> constructor for JavaPoint - JavaPoint(double x, double y) and a >>>>>>> Customer >>>>>>> record as follows: >>>>>>> >>>>>>> public class CustomerRecord { >>>>>>> private int id; >>>>>>> private String name; >>>>>>> private Object location; >>>>>>> >>>>>>> //setters and getters follow here >>>>>>> } >>>>>>> >>>>>>> Following the example in Spark source, when I create a RDD as >>>>>>> follows: >>>>>>> >>>>>>> sc.textFile(inputFileName).map(new Function<String, >>>>>>> CustomerRecord>() { >>>>>>> //call method >>>>>>> CustomerRecord rec = new CustomerRecord(); >>>>>>> rec.setLocation(SQLPoint.serialize(new JavaPoint(x, y))); >>>>>>> }); >>>>>>> >>>>>>> This results in a MatchError. The stack trace is as follows: >>>>>>> >>>>>>> scala.MatchError: [B@45aa3dd5 (of class [B) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358) >>>>>>> at >>>>>>> >>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>>>>> at >>>>>>> >>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>>>>> at >>>>>>> >>>>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>>>>>> at >>>>>>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >>>>>>> at >>>>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >>>>>>> at >>>>>>> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1358) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1356) >>>>>>> at >>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>> at >>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>> at >>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>> at >>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>> at >>>>>>> scala.collection.Iterator$$anon$10.next(Iterator.scala:312) >>>>>>> at >>>>>>> scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>>>>> at >>>>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>>>>> at >>>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >>>>>>> at >>>>>>> >>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >>>>>>> at >>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >>>>>>> at scala.collection.TraversableOnce$class.to >>>>>>> (TraversableOnce.scala:273) >>>>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >>>>>>> at >>>>>>> >>>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >>>>>>> at >>>>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >>>>>>> at >>>>>>> >>>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >>>>>>> at >>>>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) >>>>>>> at >>>>>>> >>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) >>>>>>> at >>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89) >>>>>>> at >>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>>>> at >>>>>>> >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>> at >>>>>>> >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-MatchError-in-Spark-SQL-tp26021.html >>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>> Nabble.com. >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >