I am  running adaptive logistic regression on a data set consisting of 250k 
training examples for click through rate predictions (on this sample there are 
350 clicks). For starting out I am trying each feature alone by itself to see 
how much it correlates with the data set. I have 2 problems; 

First my results are not consistent. I run my program with same input and 
configuration back to back, but the results it produces vary a lot. Sometimes 
my weights are around -3.3xxxx (which makes most sense), sometimes around 
-1.xxxx mark, but mostly around 0.000xx.  

Second when I use one of my simple feature with three categories and compare 
the regression results with the actual rates, sometimes the results do not 
correlate. Results usually give coefficients in favor of wrong features.  And 
sometimes when the order is okay, the suggested results seem to be 
overestimated than the actual ones.  

I have tried 
1)changing number of passes between 1 and 20 (as far as I learned so far, with 
my data set size for adaptive logistic regression, theoretically 1 pass should 
be enough)

2) played with windows size and interval (I'm not exactly sure how these are 
supposed to impact the results - larger window and interval size seemed to 
produce better results up to a certain point - window size:5000, interval:8000)

3)shuffling the data set before each pass which didn't really changed results

4) downsampling of non-click samples which made things even worse

my questions are : 

Is it normal that I get inconsistent results even though I don't have any 
random part on my side of the code? 
Can this bee happening because my data is too sparse?
What else can I try to tweaking?
Can you think of anything I might be missing out? 

Thank you,
Seda

Reply via email to