Re: Spark ML : One hot Encoding for multiple columns

Nicholas Sharkey Sun, 13 Nov 2016 17:00:36 -0800

Amen


> On Nov 13, 2016, at 7:55 PM, janardhan shetty <janardhan...@gmail.com> wrote:
> 
> These Jiras'  are still unresolved:
> https://issues.apache.org/jira/browse/SPARK-11215
> 
> Also there is https://issues.apache.org/jira/browse/SPARK-8418
> 
>> On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar <ni...@cloudera.com> wrote:
>> 
>> The OneHotEncoder does not accept multiple columns.
>> 
>> You can use Michal's suggestion where he uses Pipeline to set the stages and 
>> then executes them. 
>> 
>> The other option is to write a function that performs one hot encoding on a 
>> column and returns a dataframe with the encoded column and then call it 
>> multiple times for the rest of the columns.
>> 
>> 
>> 
>> 
>>> On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty <janardhan...@gmail.com> 
>>> wrote:
>>> I had already tried this way :
>>> 
>>> scala> val featureCols = Array("category","newone")
>>> featureCols: Array[String] = Array(category, newone)
>>> 
>>> scala>  val indexer = new 
>>> StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
>>> <console>:29: error: type mismatch;
>>>  found   : Array[String]
>>>  required: String
>>>         val indexer = new 
>>> StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
>>> 
>>> 
>>>> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar <ni...@cloudera.com> 
>>>> wrote:
>>>> I don't think it does. From the documentation: 
>>>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder,
>>>>  I see that it still accepts one column at a time.
>>>> 
>>>>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty 
>>>>> <janardhan...@gmail.com> wrote:
>>>>> 2.0:
>>>>> 
>>>>> One hot encoding currently accepts single input column is there a way to 
>>>>> include multiple columns ?
>>>> 
>>> 
>> 
>

Re: Spark ML : One hot Encoding for multiple columns

Reply via email to