Did you test different regularization parameters and step sizes? In the combination that works, I don't see "A + D". Did you test that combination? Are there any linear dependency between A's columns and D's columns? -Xiangrui
On Tue, Oct 7, 2014 at 1:56 PM, Sameer Tilak <ssti...@live.com> wrote: > BTW, one detail: > > When number of iterations is 100 all weights are zero or below and the > indices are only from set A. > > When number of iterations is 150 I see 30+ non-zero weights (when sorted by > weight) and indices are distributed across al sets. however MSE is high > (5.xxx) and the result does not match the domain knowledge. > > When number of iterations is 400 I see 30+ non-zero weights (when sorted by > weight) and indices are distributed across al sets. however MSE is high > (6.xxx) and the result does not match the domain knowledge. > > Any help will be highly appreciated. > > > ________________________________ > From: ssti...@live.com > To: user@spark.apache.org > Subject: MLLib Linear regression > Date: Tue, 7 Oct 2014 13:41:03 -0700 > > > Hi All, > I have following classes of features: > > class A: 15000 features > class B: 170 features > class C: 900 features > Class D: 6000 features. > > I use linear regression (over sparse data). I get excellent results with low > RMSE (~0.06) for the following combinations of classes: > 1. A + B + C > 2. B + C + D > 3. A + B > 4. A + C > 5. B + D > 6. C + D > 7. D > > Unfortunately, when I use A + B + C + D (all the features) I get results > that don't make any sense -- all weights are zero or below and the indices > are only from set A. I also get high MSE. I changed the number of iterations > from 100 to 150, 250, or even 400. I still get MSE as (5/ 6). Are there any > other parameters that I can play with? Any insight on what could be wrong? > Is it somehow it is not able to scale up to 22K features? (I highly doubt > that). > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org