Has anyone worked on non-linear/curved regression lines with Apache Spark? This seems to be such a trivial issue but I have given up after experimenting for nearly two weeks.
The plot line is as below and the raw data in the table at the end.
I just can't get Spark ML to give decent predictions with LinearRegression or any family in GeneralizedLinearRegression.

I need to predict 'sales per day' given SalesRank. As the chart shows its some kind of exponential function: lower the rank ,exponentially higher the sales.

Things I have tried:
Polynomial by taking square of features
Changing family for GLR
Changing regression parameters
Sacrificing a goat to the Apache gods.

How do I go about solving this? Do I have to resort to neural networks?




Features        Label
1       4358
5       4283
10      4193
15      4104
20      4017
50      3532
100     2851
150     2302
200     1858
250     1499
500     989
1000    553
2000    367
3500    221
5000    139
6000    126
7500    108
9000    92
10000   83
50000   12
75000   5

        

Reply via email to