Nandish Jayaram created MADLIB-1245:
---------------------------------------

             Summary: Randomize data after standardization
                 Key: MADLIB-1245
                 URL: https://issues.apache.org/jira/browse/MADLIB-1245
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Utilities
            Reporter: Nandish Jayaram


The functions `utils_ind_var_scales` and  `utils_ind_var_scales_grouping` in 
`convex.utils_regularization` are used to standardize the input data, which is 
then fed to the underlying gradient descent solver. Most often, randomizing the 
data works well with gradient descent.
The current functions create a temp table consisting of the standardized 
version of the input data, but the rows are not randomly distributed. Can we 
distribute it randomly? This might affect multiple modules, so all those 
affected modules must be tested well to ensure this change is acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to