Has anyone worked on non-linear/curved regression lines with Apache
Spark? This seems to be such a trivial issue but I have given up after
experimenting for nearly two weeks.
The plot line is as below and the raw data in the table at the end.
I just can't get Spark ML to give decent predictions with
LinearRegression or any family in GeneralizedLinearRegression.
I need to predict 'sales per day' given SalesRank. As the chart shows
its some kind of exponential function: lower the rank ,exponentially
higher the sales.
Things I have tried:
Polynomial by taking square of features
Changing family for GLR
Changing regression parameters
Sacrificing a goat to the Apache gods.
How do I go about solving this? Do I have to resort to neural networks?
Features Label
1 4358
5 4283
10 4193
15 4104
20 4017
50 3532
100 2851
150 2302
200 1858
250 1499
500 989
1000 553
2000 367
3500 221
5000 139
6000 126
7500 108
9000 92
10000 83
50000 12
75000 5