Re: Encoding categorical variables

2016-10-19 Thread Frank McQuillan
great, thanks for the additional information Frank On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey wrote: > IMO > > 1) Option to define resulting column names. Please see pdltools > implementation - the ability to pass in a function is especially useful ( >

Re: Encoding categorical variables

2016-10-19 Thread Jarrod Vawdrey
IMO 1) Option to define resulting column names. Please see pdltools implementation - the ability to pass in a function is especially useful ( http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html) 2) Option to dummy code only the top n most frequently occurring values in any column

New features in MADlib

2016-10-19 Thread Frank McQuillan
Which features would you like to see in a future version of Apache MADlib? Could be big or small stuff. Please let the community know what you think would be valuable to work on. (If you prefer to complete a short survey form about Apache MADlib, please let me know & I will send a Survey Monkey