[Scilab-users] opportunity of merging cov() and covar()

Stéphane Mottelet Wed, 19 Feb 2020 12:15:08 -0800

Hi all,

Within the development team we recently had a discussion about theimprovement of cov() in terms of speed and memory requirement and aboutthe opportunity of merging cov() and covar() wich are two disctinctmacros. Since we did not manage to reach a consensus we thought it couldbe the occasion to have the opinion of members of this list which have arecognized academical/research knowledge in probability and statistics.Here are some elements to start the discussion. Let us start withcovar() macro and what it actually computes:


* covar()

Let us start with a definition of covariance in general:

https://fr.wikipedia.org/wiki/Covariance#D%C3%A9finition_de_la_covariance

and with an example there:

https://en.wikipedia.org/wiki/Covariance#Example

In the two above links scalar/real variables are considered and in thesecond link discrete random variables are considered. In the example thecovariance is computed knowing the possible values and their jointdensity. You can easily check in the source of covar() (type "editcovar") that, after normalizing the matrix of joint probabilities (named"frequencies" in the source), the macro computes the same value, whichis confirmed by the result of the following statements:


--> x=[1 2];y=[1 2 3];fre = [1/4 1/4 0;0 1/4 1/4];covar(x,y,fre)
 ans  =

   0.25

Please note that covar() output is always a scalar. Now let us considercov():


* cov()

Here is a definition of the covariance matrix:

https://en.wikipedia.org/wiki/Covariance_matrix

Here we consider vectors of random variables (not scalar randomvariables) and in this case the covariance is a matrix. When there is noa priori knowledge on these variables (when the joint density is notknown, typically), the best you can do is, when you have samples of thisrandom vector, is to compute an estimation of the covariance matrix, seee.g. he following page:


https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices

You can verify in actual code of cov() that this macro computes the sameestimation (sums are vectorized).


We can summarize these facts this way:

* covar(x,y,fre) computes the scalar covariance of two discrete randomvariables knowing their possible values x(:) and y(:) and their jointprobability density

* When x is a matrix, cov(x) computes an estimator of the covariancematrix of a vector X of size(x,2) random variables by using size(x,1)samples of this vector (each x(i,:) is a sample). if x and y are vectorsof the same size, cov(x,y) is computed as cov([x(:) y(:)]).

To me, the main difference is that covar(x,y,fre) does not compute an_estimator_but a _exact value_. Of course, the vectors x and y can bethe unique value of two random variables, gathered from samples (x,y)and "fre" be the empirical frequency of samples (x_i,y_j). In this casecovar() will compute an estimation. For example, consider the two randomvariables X and Y, where X takes values {1,2} with equal probability,and Y=X+U where U takes values {0,1} with equal probability. We can usecovar() to compute the exact covariance of X and Y, but if we only havesamples, like in the below script, if we want to estimate the covariancewith the same macro, then unique pairs have to be found and occurencescounted in order to estimate the frequency :


N=1000;
x=ceil(rand(N,1)*2);
y=x+floor(rand(N,1)*2);

[pairs,k]=unique(gsort([x y],'lr','i'),'r');
f=diff([k;N+1])/N;

freq=sparse(pairs,f)
N/(N-1)*covar(1:2,1:3,freq)
cov(x,y)

If you have a look to the results,

--> freq
 freq  =

   0.2526   0.2489   0.
   0.       0.2453   0.2532

--> N/(N-1)*covar(1:2,1:3,freq)
 ans  =

   0.249769

--> cov(x,y)
 ans  =

   0.2500182   0.249769
   0.249769    0.4995447

you can see that

1. we have considered the same random variables as in the examplehttps://en.wikipedia.org/wiki/Covariance#Example2. covar's output (up to the normalization to correct the bias) givesthe off diagonal term of cov(x,y)

So, yes, off diagonal term of cov(x,y) and covar(x,y,fre) (up to uniquepairs determination, computation of "fre" and bias correction) have thesame value, but is it a reason to merge the two functions. I think thatthe answer is NO.

If you agree or disagree, feel free to continue the discussion in thisthread.


S.

--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users

[Scilab-users] opportunity of merging cov() and covar()

Reply via email to