[ https://issues.apache.org/jira/browse/SPARK-20619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix Cheung resolved SPARK-20619. ---------------------------------- Resolution: Fixed Assignee: Wayne Zhang Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > StringIndexer supports multiple ways of label ordering > ------------------------------------------------------ > > Key: SPARK-20619 > URL: https://issues.apache.org/jira/browse/SPARK-20619 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.1.0 > Reporter: Wayne Zhang > Assignee: Wayne Zhang > Fix For: 2.3.0 > > > StringIndexer maps labels to numbers according to the descending order of > label frequency. Other types of ordering (e.g., alphabetical) may be needed > in feature ETL. For example, the ordering will affect the result in one-hot > encoding and RFormula. Propose to support other ordering methods and we add a > parameter stringOrderType that supports the following four options: > - 'freq_desc': descending order by label frequency (most frequent label > assigned 0) > - 'freq_asc': ascending order by label frequency (least frequent label > assigned 0) > - 'alphabet_desc': descending alphabetical order > - 'alphabet_asc': ascending alphabetical order -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org