For some reason my pasted screenshots were removed when I sent the email (at least that's how it appeared on my end). Repasting as text below.
The sequence you are referring to represents the list of column names to fill. I am asking about filling a column which is of type list with an empty list. Here is a quick example of what I am doing: val spark = SparkSession.builder().master("local[*]").appName("test").getOrCreate() import spark.implicits._ val list = List(IntPair(key = "a", value = 1), IntPair(key = "a", value = 2), IntPair(key = "b", value = 2)) val df = spark.createDataset(list).toDF df.show val collectList = df.groupBy($"key").agg(collect_list("value") as "listylist") collectList.show collectList.printSchema() collectList.na.fill(Array(), Seq("listyList")) The output of the show and printSchema for the collectList df: |key|listylist| +---+---------+ | b| [2]| | a| [1, 2]| +---+---------+ root |-- key: string (nullable = true) |-- listylist: array (nullable = true) | |-- element: integer (containsNull = true) So, the last line which doesn't compile is what I would want to do (after outer joining of course, it's not necessary except in that particular case where a null could be populated in that field). Thanks, Sumona On Tue, Apr 11, 2017 at 9:50 AM Sumona Routh <sumos...@gmail.com> wrote: > The sequence you are referring to represents the list of column names to > fill. I am asking about filling a column which is of type list with an > empty list. > > Here is a quick example of what I am doing: > > > The output of the show and printSchema for the collectList df: > > > > So, the last line which doesn't compile is what I would want to do (after > outer joining of course, it's not necessary except in that particular case > where a null could be populated in that field). > > Thanks, > Sumona > > On Tue, Apr 11, 2017 at 2:02 AM Didac Gil <didacgil9...@gmail.com> wrote: > > It does support it, at least in 2.0.2 as I am running: > > Here one example: > > val parsedLines = stream_of_logs > .map(line => p.parseRecord_viaCSVParser(line)) > .join(appsCateg,$"Application"===$"name","left_outer") > .drop("id") > .na.fill(0, Seq(“numeric_field1”,"numeric_field2")) > .na.fill("", Seq( > “text_field1","text_field2","text_field3”)) > > > Notice that you have to differentiate those fields that are meant to be > filled with an int, from those that require a different value, an empty > string in my case. > > On 11 Apr 2017, at 03:18, Sumona Routh <sumos...@gmail.com> wrote: > > Hi there, > I have two dataframes that each have some columns which are of list type > (array<int> generated by the collect_list function actually). > > I need to outer join these two dfs, however by nature of an outer join I > am sometimes left with null values. Normally I would use df.na.fill(...), > however it appears the fill function doesn't support this data type. > > Can anyone recommend an alternative? I have also been playing around with > coalesce in a sql expression, but I'm not having any luck here either. > > Obviously, I can do a null check on the fields downstream, however it is > not in the spirit of scala to pass around nulls, so I wanted to see if I > was missing another approach first. > > Thanks, > Sumona > > I am using Spark 2.0.2 > > Didac Gil de la Iglesia > PhD in Computer Science > didacg...@gmail.com > Spain: +34 696 285 544 <+34%20696%2028%2055%2044> > Sweden: +46 (0)730229737 <+46%2073%20022%2097%2037> > Skype: didac.gil.de.la.iglesia > > On 11 Apr 2017, at 03:18, Sumona Routh <sumos...@gmail.com> wrote: > > Hi there, > I have two dataframes that each have some columns which are of list type > (array<int> generated by the collect_list function actually). > > I need to outer join these two dfs, however by nature of an outer join I > am sometimes left with null values. Normally I would use df.na.fill(...), > however it appears the fill function doesn't support this data type. > > Can anyone recommend an alternative? I have also been playing around with > coalesce in a sql expression, but I'm not having any luck here either. > > Obviously, I can do a null check on the fields downstream, however it is > not in the spirit of scala to pass around nulls, so I wanted to see if I > was missing another approach first. > > Thanks, > Sumona > > I am using Spark 2.0.2 > > > Didac Gil de la Iglesia > PhD in Computer Science > didacg...@gmail.com > Spain: +34 696 285 544 <+34%20696%2028%2055%2044> > Sweden: +46 (0)730229737 <+46%2073%20022%2097%2037> > Skype: didac.gil.de.la.iglesia > >