Hi all, I would appreciate comments/corrections&improvements from you guys as I am far from a specialist in this field! If some of you find this example useful, I can try to work a bit to improve it and push it as an atom module. In fact, I have a bunch of other fit functions (like gaussian, diode-like curve, ...) that follow the same approach than the sinewave example and I've long wanted to bundle them in a module where one can add a new fit function by providing the model, error, jacobian and estimate functions like in my example...
Antoine Le Lundi, Avril 06, 2020 08:43 CEST, Claus Futtrup <cfutt...@gmail.com> a écrit: > Hi Scilabers > > Good examples are worth a lot. Maybe this one could be part of the > Scilab documentation? > > Best regards, > Claus > > On 06.04.2020 08:17, Antoine Monmayrant wrote: > > Hello Heinz, > > > > See below the small example I built and I refer to whenever I need to do > > some data fitting with confidence intervals for the parameters of the model. > > It is far from perfect but it might help you untangle the Jacobian and > > covariance matrix thingy. > > Just two words of caution: (1) I am clearly less versed in applied maths > > than Stéphane or Samuel, so take this with a grain of salft; (2) I built > > this example ages ago, before Samuel improved datafit. > > > > Good luck > > > > Antoine > > > > ////////////////////////////////////////////////////////////////////////////////////// > > // REFERENCE: > > // > > // Least Squares Estimation > > // SARA A. VAN DE GEER > > // Volume 2, pp. 1041–1045 > > // in > > // Encyclopedia of Statistics in Behavioral Science > > // ISBN-13: 978-0-470-86080-9 > > // ISBN-10: 0-470-86080-4 > > // Editors Brian S. Everitt & David C. Howell > > // John Wiley & Sons, Ltd, Chichester, 2005 > > // > > > > //This is a short and definetly incomplete tutorial on data fitting in > > Scilab using leastsq. > > // > > // Basic assumption: this tutorial is for scientists that face this simple > > problem: > > // they have an experimental dataset x_epx,y_exp and a certain model > > y=Model(x,p) to fit thie dataset. > > // This tutorial will try to answer the folowing questions: > > // 1) how do you do that (simply) > > // 2) how do you do that (more reliably and more quickly) > > // a) let's go faster with a Jacobian > > // b) how good is your fit? How big is the "error bar" associated with > > each parameter of your Model? > > // c) Can we bullet proof it? > > > > //1) How do you do curve fitting in Scilab > > // > > // We need a) a model function b) a dataset c) some more work > > // 1)a) Let's start with the model function: a sinewave > > // here is the formula: y=A*sin(k*(x-x0))+y0; > > // here is the function prototypr [y]=SineModel(x,p) with > > p=[x0,k,A,y0]' > > function [y]=SineModel(x,p) > > //INPUTS: x 1D vector > > // p parameter vector of size [4,1] containing > > // x0 : sine origin > > // k : sine spatial frequency ie k=2%pi/Xp with Xp the period > > // A : sine amplitude > > // y0 : sine offset > > //OUTPUTS: y 1D vector of same size than x such that y=A*sin(k*(x-x0))+y0; > > x0=p(1); > > k=p(2); > > A=p(3); > > y0=p(4); > > y=y0+A*sin((x-x0)*k); > > endfunction > > > > // 1)b) Let's now have a fake dataset: a sine wave with some noise > > // We reuse the Model function we have just created to make this fake > > dataset > > > > x_exp=[-10:0.1:10]'; > > x0=1.55; > > k=1*%pi/3; > > A=4.3; > > y0=1.1 > > y_exp=SineModel(x_exp,[x0,k,A,y0])+(2*rand(x_exp)-1)*6/10; > > > > //let's check and see what it looks like: > > scf(); > > plot(x_exp,y_exp,'k.-'); > > xlabel('Exparimental X'); > > ylabel('Exparimental Y'); > > xtitle('Our fake experimental dataset to fit'); > > > > // 1)c) we are not done yet, we need some more work > > // First we need an error function that return for a given parameter set > > param the difference > > // between the experimental dataset and the model one: > > // Note that this function returns the difference at each point, not the > > square of the difference, > > // nor the sum over each point of the square of the differences > > function e = ErrorFitSine(param, x_exp, y_exp) > > e = SineModel(x_exp,param)-y_exp; > > endfunction > > // Now we need a starting point that is not too far away from the solution > > // Let's just fo it by hand for the moment we'll see later how to make it > > programmatically > > // Just go and have a look at the previous plot and "guess", here is mine: > > p0=[1,2*%pi/6,4,1]; > > > > //Ready to go: > > [f,popt, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); > > //popt contains the optimal parameter set that fits our dataset > > // > > scf(); > > plot(x_exp,y_exp,'k.'); > > plot(x_exp,SineModel(x_exp,popt),'r-'); > > xlabel('Experimental X') > > ylabel('Experimental Y and best fit'); > > xtitle([... > > "x0="+msprintf('%1.3f fit value: \t%1.3f',x0,popt(1));... > > "k ="+msprintf('%1.3f fit value: \t%1.3f',k,popt(2));... > > "A ="+msprintf('%1.3f fit value: \t%1.3f',A,popt(3));... > > "y0="+msprintf('%1.3f fit value: \t%1.3f',y0,popt(4))... > > ]); > > > > //Yep we are done popt is the optimal parameter set that fits our dataset > > x_exp,y_exp with our > > // model SineModel > > // How to assert the quality of our fit? We can use fopt and gropt for > > that. They should be both as small as possible. > > // Ideally, the gradient should be zeros for each parameter, otherwise it > > means we have not found an optimum. > > > > //2) How to go beyond that simple example? > > // Namely: > > // a) how to go faster? > > // b) how to estimate how good our fit is (aka get error bars on our > > parameters)? > > // c) how to make thinks more reliable with less human guessing and > > more bulletproofing? > > > > > > //2)a) and also 2)b) > > // We need the same extra function in order to speed things up and to > > estimate > > // how the "noise" on the experimental dataset translates in "noise" on > > each > > // individual parameter p(1), ...p($): the Jacobian matrix of our fit > > model. > > // Impressive name for a simple idea: providing leastsq with the partial > > // derivative of the model formula with respect to each parameter > > p(1)..p($). > > function g = ErrorJMatSine(p, x, y) > > // > > > > // g(1,:) = d(SinModel(x,p))/dx0 = d(SinModel(x,p))/d(p(1)) > > // g(2,:) = d(SinModel(x,p))/dk = d(SinModel(x,p))/d(p(2)) > > // g(3,:) = d(SinModel(x,p))/dA = d(SinModel(x,p))/d(p(3)) > > // g(4,:) = d(SinModel(x,p))/dy0 = d(SinModel(x,p))/d(p(4)) > > > > // y=A*sin(k*(x-x0))+y0; > > x0=p(1); > > k=p(2); > > A=p(3); > > y0=p(4); > > g = [... > > (-k)*SineModel(x,[x0-%pi/2/k,k,A,0]'),... > > (x-x0).*SineModel(x,[x0-%pi/2/k,k,A,0]'),... > > SineModel(x,[x0,k,1,0]'),... > > ones(x) ... > > ]; > > endfunction > > > > > > //Ready to go again, and faster this time: > > [f,popt_fast, gropt] = > > leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > > > > disp("This should be ~ [0,0,0] when both fits give exactly the same > > results") > > disp(string((popt-popt_fast)./(popt+popt_fast))) > > > > //Now we check that we are faster: > > // we do it several times to average over timing variations caused by > > // other processes running on our computer > > speedup=zeros(1,10) > > for i=1:100 > > tic(); > > [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); > > t1=toc(); > > tic(); > > [f,popt_fast, gropt] = > > leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > > t2=toc(); > > speedup(i)=t1/t2; > > end > > > > scf(); > > plot(speedup,'k.'); > > plot(mean(speedup)*ones(speedup),'r'); > > xlabel("Fit iteration"); > > ylabel("SpeedUp factor when using Jacobian Matrix"); > > xtitle("Here we have a "+msprintf("~%1.2f",mean(speedup))+... > > " speed improvement using the Jacobian Matrix"); > > > > //2)b) How can we estimate "error bars" for each individual parameters? > > // We are going to estimate how the "noise" on our dataset turns into > > // noise on each parameter by estimating the confidence interval at 95% > > > > g = ErrorJMatSine(popt_fast, x_exp, y_exp);//Jacobian matrix of the fit > > formula > > //estimate of the initial noise on the dataset to fit > > sigma2=1/(length(x_exp)-length(popt_fast))*f; > > > > //covariance matrix of fitting parameters > > pcovar=inv(g'*g)*sigma2; > > //confidence interval for each parameter > > ci=(1.96*sqrt(diag(pcovar)))'; > > > > //Let's present the results of the confidence interval calculation > > str=msprintf("%4.3f\n", popt_fast')+' +/- '+msprintf("%4.3f\n", ci'); > > str=[["x0 = ";"k = ";"A = ";"y0 = "]+str]; > > str=["Fit results:";"y=A*sin(k*(x-x0))+y0";"Param +/- conf. interval @ > > 95%";str] > > > > scf(); > > plot(x_exp,y_exp,'k.'); > > plot(x_exp,SineModel(x_exp,popt_fast),'r-'); > > xlabel('Experimental X') > > ylabel('Experimental Y and best fit'); > > xtitle('Fit with confidence intervals at 95%'); > > xstringb(-6,-3,str,12,6,'fill'); > > e=gce(); > > e.box="on"; > > e.fill_mode="on"; > > e.background=color("light gray"); > > > > //Now we have a fast fit function that can give some estimation on > > // how seriously we can believe the fitted parameters. > > // You can see that the precision with which we retrieve each parameter > > varies > > // k0 is more precisely determined than y0 (15x more precisely!) > > //You can play with the amount of noise to see how it affects the retrieved > > // parameters and confidence intervals > > > > > > > > //2)b) How can we bullet proof our fitting script by putting all this stuff > > // together is several functions that checks the arguments and modify > > // them if needed to avoid difficult to understand complaints from > > // leastsq? > > > > // test of a sinus fitting routine > > > > //// y=A*sin(k*(x-x0))+y0; > > //// function [y]=Sine(x,x0,k,A,y0) > > //// function [y]=Sine(x,p) with p=[x0,k,A,y0]' > > //function [y]=Sine(x,varargin) > > //// pause > > // select length(varargin) > > // case 1 then > > // //we use form [y]=Sine(x,p) where p=[x0,dx,A,y0]' > > // p=varargin(1); > > // case 4 then > > // //we use form [y]=Sine(x,x0,k,A,y0) > > // p=[varargin(1);varargin(2);varargin(3);varargin(4)]; > > // else > > // //call is not correct, we give up > > // y=[];return > > // end > > // x0=p(1); > > // k=p(2); > > // A=p(3); > > // y0=p(4); > > // y=y0+A*sin((x-x0)*k); > > //endfunction > > // > > function [p0,pinf,psup]=EstimateFitSine(x,y) > > // y=A*sin(k*(x-x0))+y0; > > y0=mean(y); > > A=(max(y)-min(y))/2; > > // pause > > sy=fftshift(fft(y-mean(y))); > > //sy=fft(y-mean(y)); > > msy=max(abs(sy));rg=find(abs(sy)==msy); > > delta=(rg(2)-rg(1))/length(y); > > k=delta*%pi/(mean(diff(x))); > > x0=mean(abs(atan(imag(sy(rg)),real(sy(rg)))/k))+%pi/2/k; > > p0=[x0,k,A,y0]; > > pinf=[min(x),0 ,-%inf ,-%inf]; > > psup=[max(x),%inf ,%inf ,%inf]; > > endfunction > > > > > > function e = ErrorFitSine(param, xi, yi) > > e = SineModel(xi,param)-yi; > > endfunction > > > > function g = dErrorFitSine(param, xi, yi) > > // y=A*sin(k*(x-x0))+y0; > > x0=param(1); > > k=param(2); > > A=param(3); > > y0=param(4); > > // pause > > g = [... > > (-k)*SineModel(xi,[x0-%pi/2/k,k,A,0]),... > > (xi-x0).*SineModel(xi,[x0-%pi/2/k,k,A,0]),... > > SineModel(xi,[x0,k,1,0]),... > > ones(xi) ... > > ]; > > endfunction > > > > > > > > function [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0,varargin) > > // FIT Experimental dataset y(x) with a sine model > > // [y]=A*sin(k*(x-x0))+y0; > > // > > //INPUTS: > > //x : experiemental x dataset > > //y : experimental y dataset > > //p0 : starting parameter set [x0,k,A,y0] > > //varargin : > > // pinf inferior limit for popt > > // psup superior limit for popt > > //OUTPUTS > > //f : value of the cost function for the best > > parameter set > > //popt : best parameter set > > //yopt : fit function evaluated on the x dataset > > //gropt : gradient of the cost function at x > > //ci ; confidence interval @ 95% on each parameter in > > popt > > //pcovar ; covariance matrix of popt > > > > // CHECK x and y are col not row > > if isrow(x) then > > x=x.'; > > end > > if isrow(y) then > > y=y.'; > > end > > > > if (length(varargin) < 2) then > > // No constraints on parameter set were provided > > [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,p0); > > // [f,popt_fast, gropt] = > > leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > > else > > // Constraints on parameter set were provided > > pinf=varargin(1); > > psup=varargin(2); > > [f,popt, gropt] = > > leastsq(list(ErrorFitSine,x,y),dErrorFitSine,"b",pinf,psup,p0); > > end > > //normalized best fit evaluated on normalized x > > yopt=ErrorFitSine(popt, x, zeros(x)); > > //rescale best fit param > > > > g = dErrorFitSine(popt, x, y);//Jacobian matrix of the fit formula > > //estimate of the noise on the signal to fit > > sigma2=1/(length(x)-length(popt))*f; > > > > //covariance matrix of fitting parameters > > pcovar=inv(g'*g)*sigma2; > > //confidence interval for each parameter > > ci=1.96*sqrt(diag(pcovar)); > > ci=ci.'; > > endfunction > > > > > > // Let's use our unified and bullet-proof routine > > > > x=[-10:0.1:10]; > > //y=Sine(x,[1,2*%pi/3,4,1])+(2*rand(x)-1)*0.1; > > y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2; > > //y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2/10; > > > > p0=EstimateFitSine(x,y); > > [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0); > > > > > > str="$"+["x_0";"k";"A";"y_0"]+"="+string(popt.')+"\pm"+string(ci.')+"$"; > > str=["$\text{Model: }[y]=A\sin(k(x-x_0))+y_0$";str]; > > > > scf(); > > plot(x,y,'k.'); > > plot(x,SineModel(x,p0),'g'); > > plot(x,SineModel(x,popt),'r'); > > xlabel("$x$"); > > ylabel("$y$"); > > xtitle(str) > > legend(["Data";"Guess";"Fit"]) > > > > ////////////////////////////////////////////////////////////////////////////////////// > > > > Le Samedi, Avril 04, 2020 15:13 CEST, Heinz Nabielek <heinznabie...@me.com> > > a écrit: > > > >> Scilab friends: the power of Scilab is amazing and I have used it recently > >> for non-linear least-squares fitting, below example from Scilab help > >> function for "datafit". On occasions, I have also used "leastsq". > >> > >> Question: how do I derive the 1sigma standard error in the three > >> parameters p(1), p(2), and p(3)? And, if it is not too complicated, > >> covariances? > >> > >> I know this is written in dozens of textbooks, but I am always getting > >> lost. > >> Please provide a simple recipe written in Scilab. > >> Best greetings > >> Heinz > >> > >> > >> > >> // -- 04/04/2020 14:57:30 -- // > >> //generate the data > >> function y=FF(x, p) > >> y=p(1)*(x-p(2))+p(3)*x.*x > >> endfunction > >> X=[]; > >> Y=[]; > >> pg=[34;12;14] //parameter used to generate data > >> for x=0:.1:3 > >> Y=[Y,FF(x,pg)+100*(rand()-.5)]; > >> X=[X,x]; > >> end > >> Z=[Y;X]; > >> //The criterion function > >> function e=G(p, z), > >> y=z(1),x=z(2); > >> e=y-FF(x,p), > >> endfunction > >> //Solve the problem > >> p0=[3;5;10] > >> [p,err]=datafit(G,Z,p0); > >> scf(0);clf() > >> plot2d(X,FF(X,pg),5) //the curve without noise > >> plot2d(X,Y,-1) // the noisy data > >> plot2d(X,FF(X,p),12) //the solution > >> xgrid();legend("the curve without noise"," the noisy data", "THE FINAL > >> SOLUTION.....",4); > >> title("solution set 39.868419 10.312053 11.482521","fontsize",4); > > _______________________________________________ > > users mailing list > > users@lists.scilab.org > > http://lists.scilab.org/mailman/listinfo/users > > > > -- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > _______________________________________________ > users mailing list > users@lists.scilab.org > http://lists.scilab.org/mailman/listinfo/users > _______________________________________________ users mailing list users@lists.scilab.org http://lists.scilab.org/mailman/listinfo/users