Thank you Ayan.

For example, I have a dataframe below. And consider column "group" as key
to split this dataframe to three part, then want use kmeans to each split
part. To get each group's kmeans result.

+-----------+-----+------------+
|     userID|group|    features|
+-----------+-----+------------+
|12462563356|    1|  [5.0,43.0]|
|12462563701|    2|   [1.0,8.0]|
|12462563701|    1|  [2.0,12.0]|
|12462564356|    1|   [1.0,1.0]|
|12462565487|    3|   [2.0,3.0]|
|12462565698|    2|   [1.0,1.0]|
|12462565698|    1|   [1.0,1.0]|
|12462566081|    2|   [1.0,2.0]|
|12462566081|    1|  [1.0,15.0]|
|12462566225|    2|   [1.0,1.0]|
|12462566225|    1|  [9.0,85.0]|
|12462566526|    2|   [1.0,1.0]|
|12462566526|    1|  [3.0,79.0]|
|12462567006|    2| [11.0,15.0]|
|12462567006|    1| [10.0,15.0]|
|12462567006|    3| [10.0,15.0]|
|12462586595|    2|  [2.0,42.0]|
|12462586595|    3|  [2.0,16.0]|
|12462589343|    3|   [1.0,1.0]|
+-----------+-----+------------+



From:   ayan guha <guha.a...@gmail.com>
To:     Wen Pei Yu/China/IBM@IBMCN
Cc:     user <user@spark.apache.org>, Nirmal Fernando <nir...@wso2.com>
Date:   08/23/2016 05:13 PM
Subject:        Re: Apply ML to grouped dataframe



I would suggest you to construct a toy problem and post for solution. At
this moment it's a little unclear what your intentions are.


Generally speaking, group by on a data frame created another data frame,
not multiple ones.


On 23 Aug 2016 16:35, "Wen Pei Yu" <yuw...@cn.ibm.com> wrote:
  Hi Mirmal

  Filter works fine if I want handle one of grouped dataframe. But I has
  multiple grouped dataframe, I wish I can apply ML algorithm to all of
  them in one job, but not in for loops.

  Wenpei.

  Inactive hide details for Nirmal Fernando ---08/23/2016 01:55:46 PM---On
  Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuwenp@cn.iNirmal Fernando
  ---08/23/2016 01:55:46 PM---On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu
  <yuw...@cn.ibm.com> wrote: > We can group a dataframe b

  From: Nirmal Fernando <nir...@wso2.com>
  To: Wen Pei Yu/China/IBM@IBMCN
  Cc: User <user@spark.apache.org>
  Date: 08/23/2016 01:55 PM
  Subject: Re: Apply ML to grouped dataframe





  On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuw...@cn.ibm.com> wrote:
        We can group a dataframe by one column like

        df.groupBy(df.col("gender"))


  On top of this DF, use a filter that would enable you to extract the
  grouped DF as separated DFs. Then you can apply ML on top of each DF.

  eg: xyzDF.filter(col("x").equalTo(x))

        It like split a dataframe to multiple dataframe. Currently, we can
        only apply simple sql function to this GroupedData like agg, max
        etc.

        What we want is apply one ML algorithm to each group.

        Regards.

        Inactive hide details for Nirmal Fernando ---08/23/2016 01:14:48
        PM---Hi Wen, AFAIK Spark MLlib implements its machine learning
        Nirmal Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark
        MLlib implements its machine learning algorithms on top of

        From: Nirmal Fernando <nir...@wso2.com>
        To: Wen Pei Yu/China/IBM@IBMCN
        Cc: User <user@spark.apache.org>
        Date: 08/23/2016 01:14 PM



        Subject: Re: Apply ML to grouped dataframe



        Hi Wen,

        AFAIK Spark MLlib implements its machine learning algorithms on top
        of Spark dataframe API. What did you mean by a grouped dataframe?

        On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <yuw...@cn.ibm.com>
        wrote:
                    Hi Nirmal

                    I didn't get your point.
                    Can you tell me more about how to use MLlib to grouped
                    dataframe?

                    Regards.
                    Wenpei.

                    Inactive hide details for Nirmal Fernando ---08/23/2016
                    10:26:36 AM---You can use Spark MLlib
                    http://spark.apache.org/docs/lateNirmal Fernando
                    ---08/23/2016 10:26:36 AM---You can use Spark MLlib
                    
http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas


                    From: Nirmal Fernando <nir...@wso2.com>
                    To: Wen Pei Yu/China/IBM@IBMCN
                    Cc: User <user@spark.apache.org>
                    Date: 08/23/2016 10:26 AM
                    Subject: Re: Apply ML to grouped dataframe




                    You can use Spark MLlib
                    
http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api


                    On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <
                    yuw...@cn.ibm.com> wrote:
                                            Hi

                                            We have a dataframe, then want
                                            group it and apply a ML
                                            algorithm or statistics(say t
                                            test) to each one. Is there any
                                            efficient way for this
                                            situation?

                                            Currently, we transfer to
                                            pyspark, use groupbykey and
                                            apply numpy function to array.
                                            But this wasn't an efficient
                                            way, right?

                                            Regards.
                                            Wenpei.






                    --

                    Thanks & regards,
                    Nirmal

                    Team Lead - WSO2 Machine Learner
                    Associate Technical Lead - Data Technologies Team, WSO2
                    Inc.
                    Mobile: +94715779733
                    Blog: http://nirmalfdo.blogspot.com/






        --

        Thanks & regards,
        Nirmal

        Team Lead - WSO2 Machine Learner
        Associate Technical Lead - Data Technologies Team, WSO2 Inc.
        Mobile: +94715779733
        Blog: http://nirmalfdo.blogspot.com/








  --

  Thanks & regards,
  Nirmal

  Team Lead - WSO2 Machine Learner
  Associate Technical Lead - Data Technologies Team, WSO2 Inc.
  Mobile: +94715779733
  Blog: http://nirmalfdo.blogspot.com/









Reply via email to