GoranSMilovanovic added a comment.

  Update `Tue 28 Apr 2020 02:17:33 AM UTC`
  
  Here goes the update report on SPARQL feature selection via XGBoost:
  
  F31783672: WDQS Endpoint Analytics_20200427_B.nb.html 
<https://phabricator.wikimedia.org/F31783672>
  
  - The model performance was improved mainly by (a) improving upon the feature 
engineering process (currently: not great, not terrible), and (b) controlling 
for a highly imbalanced design (i.e. the number of queries with "typical" 
processing times heavily outnumber the number of queries with "extreme" 
processing times in the sample) by switching from XGBoost control parameters 
(like `scale_pos_weight`) to a manually implemented Downsampling strategy;
  
  - I have switched from a definition of "extremely long processing time" as an 
extreme outlier to a definition which takes it to be a *mild outlier*: it poses 
a more difficult binary classification problem but still we get significant 
improvements (spot the difference between the Hit and False Alarm rate):
  
  - model **accuracy** is around 85%;
  - **Hit rate** (or True Positive Rate) is around 72%, and
  - **False alarm rate** (or False Positive Rate) is about 13%.
  
  The list of critical SPARQL features (plus what has been extracted as a 
feature from `event.wdqs_external_sparql_query` is found in *Section 4. 
Selected features*.
  
  Good night.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, Simon_Villeneuve, dcausse, 
Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, WMDE-leszek, Aklapper, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to