Hello,

  Thank you for the response!

  I can think of two ways to get the largest city by country, but both seem
to be inefficient:

  (1) I could group by country, sort each group by population, add the row
number within each group, and then retain only cities with a row number
equal to 1. But it seems wasteful to sort everything when I only want the
largest of each country

  (2) I could group by country, get the maximum city population for each
country, join that with the original data frame, and then retain only
cities with population equal to the maximum population in the country. But
that seems also expensive because I need to join.

  Am I missing something?

  Thanks!

     Best, Oliver

On Mon, Dec 19, 2022 at 10:59 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> In spark you can use windowing function
> <https://sparkbyexamples.com/spark/spark-sql-window-functions/>s to
> achieve this
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 19 Dec 2022 at 15:28, Oliver Ruebenacker <
> oliv...@broadinstitute.org> wrote:
>
>>
>>      Hello,
>>
>>   How can I retain from each group only the row for which one value is
>> the maximum of the group? For example, imagine a DataFrame containing all
>> major cities in the world, with three columns: (1) City name (2) Country
>> (3) population. How would I get a DataFrame that only contains the largest
>> city in each country? Thanks!
>>
>>      Best, Oliver
>>
>> --
>> Oliver Ruebenacker, Ph.D. (he)
>> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, 
>> Flannick
>> Lab <http://www.flannicklab.org/>, Broad Institute
>> <http://www.broadinstitute.org/>
>>
>

-- 
Oliver Ruebenacker, Ph.D. (he)
Senior Software Engineer, Knowledge Portal Network
<http://kp4cd.org/>, Flannick
Lab <http://www.flannicklab.org/>, Broad Institute
<http://www.broadinstitute.org/>

Reply via email to