[ https://issues.apache.org/jira/browse/MADLIB-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nandish Jayaram updated MADLIB-1225: ------------------------------------ Description: Install check seems to fail for random forest sporadically. The failure happens for the test which deals with variable importance in the install check. The error in the log when a failure happens is: {code} SELECT assert(cat_var_importance[1] > con_var_importance[1], 'class should be important!'), assert(cat_var_importance[1] > cat_var_importance[2], 'class should be important!') FROM train_output_group; psql:/tmp/madlib.WW_EyD/recursive_partitioning/test/random_forest.sql_in.tmp:158: ERROR: Failed assertion: class should be important! (seg0 slice1 93e250c8-8924-4a80-5c68-1464f40b0395:25432 pid=91044) {code} The last RF install-check query that was run before the error was: {code} SELECT forest_train( 'dt_golf', -- source table 'train_output', -- output model table 'id', -- id column 'class::TEXT', -- response 'class, windy, temperature', -- features NULL, -- exclude columns NULL, -- no grouping 10, -- num of trees 1, -- num of random features TRUE, -- importance 3, -- num_permutations 10, -- max depth 1, -- min split 1, -- min bucket 8, -- number of bins per continuous variable 'max_surrogates=0', FALSE ); SELECT * from train_output_summary; -[ RECORD 1 ]---------+-------------------------------- method | forest_train is_classification | t source_table | dt_golf model_table | train_output id_col_name | id dependent_varname | class::TEXT independent_varnames | class,windy,temperature cat_features | class,windy con_features | temperature grouping_cols | num_trees | 10 num_random_features | 1 max_tree_depth | 10 min_split | 1 min_bucket | 1 num_splits | 8 verbose | f importance | t num_permutations | 3 num_all_groups | 1 num_failed_groups | 0 total_rows_processed | 16 total_rows_skipped | 0 dependent_var_levels | "Don't Play","Play" dependent_var_type | text independent_var_types | text, boolean, double precision null_proxy | None SELECT * from train_output_group; -[ RECORD 1 ]------+--------------------------------------- gid | 1 success | t cat_n_levels | \{2,2} cat_levels_in_text | \{"Don't Play",Play,False,True} oob_error | 0.20000000000000000000 cat_var_importance | \{0.0244444444444445,0.025487012987013} con_var_importance | \{0} {code} was:Install check seems to fail for random forest sporadically. The failure happens for the test which deals with variable importance in the install check. > Sporadic install check failures in random forest > ------------------------------------------------ > > Key: MADLIB-1225 > URL: https://issues.apache.org/jira/browse/MADLIB-1225 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Random Forest > Reporter: Nandish Jayaram > Priority: Major > Fix For: v1.14 > > > Install check seems to fail for random forest sporadically. The failure > happens for the test which deals with variable importance in the install > check. > The error in the log when a failure happens is: > {code} > SELECT > assert(cat_var_importance[1] > con_var_importance[1], 'class should be > important!'), > assert(cat_var_importance[1] > cat_var_importance[2], 'class should be > important!') > FROM train_output_group; > psql:/tmp/madlib.WW_EyD/recursive_partitioning/test/random_forest.sql_in.tmp:158: > ERROR: Failed assertion: class should be important! (seg0 slice1 > 93e250c8-8924-4a80-5c68-1464f40b0395:25432 pid=91044) > {code} > The last RF install-check query that was run before the error was: > {code} > SELECT forest_train( > 'dt_golf', -- source table > 'train_output', -- output model table > 'id', -- id column > 'class::TEXT', -- response > 'class, windy, temperature', -- features > NULL, -- exclude columns > NULL, -- no grouping > 10, -- num of trees > 1, -- num of random features > TRUE, -- importance > 3, -- num_permutations > 10, -- max depth > 1, -- min split > 1, -- min bucket > 8, -- number of bins per continuous variable > 'max_surrogates=0', > FALSE > ); > SELECT * from train_output_summary; > -[ RECORD 1 ]---------+-------------------------------- > method | forest_train > is_classification | t > source_table | dt_golf > model_table | train_output > id_col_name | id > dependent_varname | class::TEXT > independent_varnames | class,windy,temperature > cat_features | class,windy > con_features | temperature > grouping_cols | > num_trees | 10 > num_random_features | 1 > max_tree_depth | 10 > min_split | 1 > min_bucket | 1 > num_splits | 8 > verbose | f > importance | t > num_permutations | 3 > num_all_groups | 1 > num_failed_groups | 0 > total_rows_processed | 16 > total_rows_skipped | 0 > dependent_var_levels | "Don't Play","Play" > dependent_var_type | text > independent_var_types | text, boolean, double precision > null_proxy | None > SELECT * from train_output_group; > -[ RECORD 1 ]------+--------------------------------------- > gid | 1 > success | t > cat_n_levels | \{2,2} > cat_levels_in_text | \{"Don't Play",Play,False,True} > oob_error | 0.20000000000000000000 > cat_var_importance | \{0.0244444444444445,0.025487012987013} > con_var_importance | \{0} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)