Is there some place I can read more about it ? I can't find any reference. I actully want to flatten these structures and not return them from the UDF.
Thanks, Daniel On Tue, Nov 25, 2014 at 8:44 PM, Michael Armbrust <mich...@databricks.com> wrote: > Maps should just be scala maps, structs are rows inside of rows. If you > wan to return a struct from a UDF you can do that with a case class. > > On Tue, Nov 25, 2014 at 10:25 AM, Daniel Haviv <danielru...@gmail.com> > wrote: > >> Thank you. >> >> How can I address more complex columns like maps and structs? >> >> Thanks again! >> Daniel >> >> On 25 בנוב׳ 2014, at 19:43, Michael Armbrust <mich...@databricks.com> >> wrote: >> >> Probably the easiest/closest way to do this would be with a UDF, >> something like: >> >> registerFunction("makeString", (s: Seq[String]) => s.mkString(",")) >> sql("SELECT *, makeString(c8) AS newC8 FROM jRequests") >> >> Although this does not modify a column, but instead appends a new column. >> >> Another more complicated way to do something like this would be by using the >> applySchema function >> <http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema> >> . >> >> I'll note that, as part of the ML pipeline work, we have been considering >> adding something like: >> >> def modifyColumn(columnName, function) >> >> Any comments anyone has on this interface would be appreciated! >> >> Michael >> >> On Tue, Nov 25, 2014 at 7:02 AM, Daniel Haviv <danielru...@gmail.com> >> wrote: >> >>> Hi, >>> I'm selecting columns from a json file, transform some of them and would >>> like to store the result as a parquet file but I'm failing. >>> >>> This is what I'm doing: >>> >>> val jsonFiles=sqlContext.jsonFile("/requests.loading") >>> jsonFiles.registerTempTable("jRequests") >>> >>> val clean_jRequests=sqlContext.sql("select c1, c2, c3 ... c55 from >>> jRequests") >>> >>> and then I run a map: >>> val >>> jRequests_flat=clean_jRequests.map(line=>{((line(1),line(2),line(3),line(4),line(5),line(6),line(7), >>> *line(8).asInstanceOf[Iterable[String]].mkString(",")*,line(9) >>> ,line(10) ,line(11) ,line(12) ,line(13) ,line(14) ,line(15) ,line(16) >>> ,line(17) ,line(18) ,line(19) ,line(20) ,line(21) ,line(22) ,line(23) >>> ,line(24) ,line(25) ,line(26) ,line(27) ,line(28) ,line(29) ,line(30) >>> ,line(31) ,line(32) ,line(33) ,line(34) ,line(35) ,line(36) ,line(37) >>> ,line(38) ,line(39) ,line(40) ,line(41) ,line(42) ,line(43) ,line(44) >>> ,line(45) ,line(46) ,line(47) ,line(48) ,line(49) ,line(50)))}) >>> >>> >>> >>> 1. Is there a smarter way to achieve that (only modify a certain column >>> without relating to the others, but keeping all of them)? >>> 2. The last statement fails because the tuple has too much members: >>> <console>:19: error: object Tuple50 is not a member of package scala >>> >>> >>> Thanks for your help, >>> Daniel >>> >>> >> >