Hi Ayan, Thanks for your update. All i am trying is to update an inner field in one of the dataframe's complex type column. withColumn method adds or replaces existing column. In my case column is a nested column. Please see the below example i mentioned in the mail.
I dont have to add a new column. One way of my thinking to solve this is to create a new complex type column(structtype), same as the one available in dataframe, and during the process update the nested field. At the end add the newly created struct type column to the dataframe and drop old one. Disadvantage: 1. However, this will require iterating through millions of rows leading to perf impact. 2. If there is only one/few columns to be updated, it may not be right way to create a new column and add to dataframe. Any help will be greatly appreciated! Thanks. On Monday, July 18, 2016, ayan guha <guha.a...@gmail.com> wrote: > Hi > > withColumn adds the column. If you want different name, please use > .alias() function. > > On Mon, Jul 18, 2016 at 2:16 AM, java bigdata <hadoopst...@gmail.com > <javascript:_e(%7B%7D,'cvml','hadoopst...@gmail.com');>> wrote: > >> Hi Team, >> >> I am facing a major issue while transforming dataframe containing complex >> datatype columns. I need to update the inner fields of complex datatype, >> for eg: converting one inner field to UPPERCASE letters, and return the >> same dataframe with new upper case values in it. Below is my issue >> description. Kindly suggest/guide me a way forward. >> >> *My suggestion: *can we have a new version of >> *dataframe.withcolumn(<innerfieldreference>, >> udf($innerfieldreference), <reference or colname indicator argument>)*, >> so that when this method gets executed, i get same dataframe with >> transformed values. >> >> >> *Issue Description:* >> Using dataframe.withColumn(<colname>,udf($colname)) for inner fields in >> struct/complex datatype, results in a new dataframe with the a new column >> appended to it. "colname" in the above argument is given as fullname with >> dot notation to access the struct/complex fields. >> >> For eg: hive table has columns: (id int, address struct<line1: struct< >> buildname:string, stname:string>>, line2:string>) >> >> I need to update the inner field 'buildname'. I can select the inner >> field through dataframe as : df.select($"address.line1.buildname"), however >> when I use df.withColumn("address.line1.buildname", >> toUpperCaseUDF($"address.line1.buildname")), it is resulting in a new >> dataframe with new column: "address.line1.buildname" appended, with >> toUpperCaseUDF values from inner field buildname. >> >> How can I update the inner fields of the complex data types. Kindly >> suggest. >> >> Thanks in anticipation. >> >> Best Regards, >> Naveen Kumar. >> > > > > -- > Best Regards, > Ayan Guha >