I have a dataset (trimmed and simplified) with 2 columns as below. Date Subject 2015-01-14 "SEC Inquiry" 2014-02-12 "Happy birthday" 2014-02-13 "Re: Happy birthday" 2015-01-16 "Re: SEC Inquiry" 2015-01-18 "Fwd: Re: SEC Inquiry"
I have imported the same in a Spark Dataframe. What I am looking at is groupBy subject field (however, I need a partial match to identify the discussion topic). For example in the above case.. I would like to group all messages, which have subject containing "SEC Inquiry" which returns following grouped frame: 2015-01-14 "SEC Inquiry" 2015-01-16 "Re: SEC Inquiry" 2015-01-18 "Fwd: Re: SEC Inquiry" Another usecase for a similar problem could be group by year (in the above example), it would mean partial match of the date field, which would mean groupBy Date by matching year as "2014" or "2015". Keenly Looking forward to reply/solution to the above. - Suraj