GoranSMilovanovic added a comment.

  Update `Thu 09 Apr 2020 10:19:24 PM UTC`:
  
  - XGBoost w. `gbtree` on a binary classification problem ("typical" vs. 
"extreme outlier" server response times) cross-validation started on 
**stat1005**;
  - using 9 data sets with varying number of features (<100 - 2000);
  - splitting test from train data for each data set;
  - running `xgboost` internal cross-validation controls;
  - cross-validating across: learning rate (`eta`, 4 levels), subsample (rows, 
4 levels) parameter to build trees, `max_depth` (how deep trees are allowed, 4 
levels);
  - number of iterations set to monotonically decrease with `eta`;
  - keeping `colsample_bytree` (proportion of features used to build each tree) 
fixed at .5;
  - setting `max_delta_step` to 1 - documented to be useful for highly 
unbalanced designs in binary classification (as ours is);
  - model selection: ROC Analysis -> AUC.
  
  Resource consumption: 32 cores, approx. 15Gb RAM.
  Approximate running time guesstimate: 24 - 30h.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: JAllemandou, Lucas_Werkmeister_WMDE, Simon_Villeneuve, dcausse, Jakob_WMDE, 
Gehel, Addshore, Lydia_Pintscher, WMDE-leszek, Aklapper, darthmon_wmde, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to