After doing group, you can use mkstring on the data frame. Following is an
example where are columns are concatenated with space as a separator.
scala> call_cdf.map(row => row.mkString(" ")).show(false)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 AAAAAAAABAAAAAAA 1998-01-01 null null 2450997 NY Metro large 2325 1374075
8AM-12AM Keith Cunningham 4 Matters may hear as; profita New, cold plants can
put al Dante Cook 3 pri 4 ese 995 Park 3rd Dr. Suite 470 Five Points Ziebach
County SD 56098 United States -6.0 0.02
|
|2 AAAAAAAACAAAAAAA 1998-01-01 2000-12-31 null 2450876 Mid Atlantic large 4208
837392 8AM-4PM Stephen Clem 3 Classes devote largely other, standard ter Free
germans prove flatly industrial drugs. Low questions come to a equations.
British, conservative Christopher Perez 6 cally 3 pri 245 Johnson Circle Suite
200 Fairview Williamson County TN 35709 United States -5.0 0.03|
|3 AAAAAAAACAAAAAAA 2001-01-01 null null 2450876 Mid Atlantic small 3251 837392
8AM-4PM William Johnson 3 Classes devote largely other, standard ter Ridiculous
requirements must not implement about pure values. Substances know powers.
Political rel Derrick Burke 6 cally 3 pri 245 Johnson Circle Suite 200
Fairview Williamson County TN 35709 United States -5.0 0.03 |
|4 AAAAAAAAEAAAAAAA 1998-01-01 2000-01-01 null 2450872 North Midwest large 2596
708708 8AM-4PM Lamont Greene 3 Events must find anyway Great rates must ensure
famous, other banks. As main goals get home as a Marvin Dean 2 able 2 able 927
Oak Main ST Suite 150 Five Points Williamson County TN 36098 United States -5.0
0.03 |
|5 AAAAAAAAEAAAAAAA 2000-01-02 2001-12-31 null 2450872 North Midwest medium
2596 708708 8AM-12AM Lamont Greene 3 Events must find anyway So fresh supplies
keep meanwhile religious, labour years. Rapid, careful subject Matthew Williams
2 able 1 able 927 Oak Main ST Suite 150 Five Points Williamson County TN 36098
United States -5.0 0.0 |
|6 AAAAAAAAEAAAAAAA 2002-01-01 null null 2450872 North Midwest small 2596
708708 8AM-4PM Emilio Romano 6 As well novel sentences check through the plans.
Sophisticated cities fall for e William Johnson 5 anti 1 able 927 Oak Main ST
Suite 150 Five Points Williamson County TN 36098 United States -5.0 0.07
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
|cc_call_center_sk|cc_call_center_id|cc_rec_start_date|cc_rec_end_date|cc_closed_date_sk|cc_open_date_sk|
cc_name|cc_class|cc_employees|cc_sq_ft|cc_hours|
cc_manager|cc_mkt_id| cc_mkt_class|
cc_mkt_desc|cc_market_manager|cc_division|cc_division_name|cc_company|cc_company_name|cc_street_number|cc_street_name|cc_street_type|cc_suite_number|
cc_city| cc_county|cc_state|cc_zip|
cc_country|cc_gmt_offset|cc_tax_percentage|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
| 1| AAAAAAAABAAAAAAA| 1998-01-01| null|
null| 2450997| NY Metro| large| 2325|
1374075|8AM-12AM|Keith Cunningham| 4|Matters may hear ...|New, cold
plants ...| Dante Cook| 3| pri| 4|
ese| 995| Park 3rd| Dr.| Suite 470|Five
Points| Ziebach County| SD| 56098|United States| -6.0|
0.02|
| 2| AAAAAAAACAAAAAAA| 1998-01-01| 2000-12-31|
null| 2450876| Mid Atlantic| large| 4208| 837392| 8AM-4PM|
Stephen Clem| 3|Classes devote la...|Free germans prov...|Christopher
Perez| 6| cally| 3| pri| 245|
Johnson | Circle| Suite 200| Fairview|Williamson County|
TN| 35709|United States| -5.0| 0.03|
| 3| AAAAAAAACAAAAAAA| 2001-01-01| null|
null| 2450876| Mid Atlantic| small| 3251| 837392| 8AM-4PM|
William Johnson| 3|Classes devote la...|Ridiculous requir...| Derrick
Burke| 6| cally| 3| pri| 245|
Johnson | Circle| Suite 200| Fairview|Williamson County|
TN| 35709|United States| -5.0| 0.03|
| 4| AAAAAAAAEAAAAAAA| 1998-01-01| 2000-01-01|
null| 2450872|North Midwest| large| 2596| 708708| 8AM-4PM|
Lamont Greene| 3|Events must find ...|Great rates must ...|
Marvin Dean| 2| able| 2| able|
927| Oak Main| ST| Suite 150|Five Points|Williamson
County| TN| 36098|United States| -5.0| 0.03|
| 5| AAAAAAAAEAAAAAAA| 2000-01-02| 2001-12-31|
null| 2450872|North Midwest| medium| 2596| 708708|8AM-12AM|
Lamont Greene| 3|Events must find ...|So fresh supplies...| Matthew
Williams| 2| able| 1| able|
927| Oak Main| ST| Suite 150|Five Points|Williamson
County| TN| 36098|United States| -5.0| 0.0|
| 6| AAAAAAAAEAAAAAAA| 2002-01-01| null|
null| 2450872|North Midwest| small| 2596| 708708| 8AM-4PM|
Emilio Romano| 6|As well novel sen...|Sophisticated cit...| William
Johnson| 5| anti| 1| able|
927| Oak Main| ST| Suite 150|Five Points|Williamson
County| TN| 36098|United States| -5.0| 0.07|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
From: Somasundaram Sekar [mailto:[email protected]]
Sent: Sunday, October 08, 2017 5:30 PM
To: [email protected]
Subject: Equivalent of Redshift ListAgg function in Spark (Pyspak)
Hi,
I want to concat multiple columns into a single column after grouping the
DataFrame,
I want an functional equivalent of Redshift ListAgg function
pg_catalog.Listagg(column, '|')
within GROUP( ORDER BY column) AS
name
LISTAGG Function
: For each group in a query, the LISTAGG aggregate function orders the rows for
that group according to the ORDER BY expression, then concatenates the values
into a single string.
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient, you are not authorized to read, retain, copy, print, distribute or
use this message. If you have received this communication in error, please
notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.