Hi: Thanks for both answers. One final question. *This registerTempTable is not an extra process that the SQL queries need to do that may decrease performance over the language integrated method calls? *The thing is that I am planning to use them in the current version of the ML Pipeline transformers classes for feature extraction, and If I need to save the input and maybe output SchemaRDD of the transform function in every transformer, this may not very efficient.
Thanks On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores <ces...@gmail.com> wrote: > >> I am new to the SchemaRDD class, and I am trying to decide in using SQL >> queries or Language Integrated Queries ( >> https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD >> ). >> >> Can someone tell me what is the main difference between the two >> approaches, besides using different syntax? Are they interchangeable? Which >> one has better performance? >> > > One difference is that the language integrated queries are method calls on > the SchemaRDD you want to work on, which requires you have access to the > object at hand. The SQL queries are passed to a method of the SQLContext > and you have to call registerTempTable() on the SchemaRDD you want to use > beforehand, which can basically happen at an arbitrary location of your > program. (I don't know if I could express what I wanted to say.) That may > have an influence on how you design your program and how the different > parts work together. > > Tobias > -- Cesar Flores