Yet another example of an unexpected behaviour: >>> a=np.ma.array([], mask=0) >>> b=np.ma.array([]) >>> np.ma.allequal(a,b) True >>> a.mean() masked >>> b.mean() nan
But >>>a masked_array(data = [], mask = [], fill_value = 1e+20) >>> b masked_array(data = [], mask = False, fill_value = 1e+20) After some googling I found on Stack Overflow http://stackoverflow.com/questions/13354295/python-numpy-masked-array-initialization (this is not clearly explained on the numpy doc pagehttp://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html#the-maskedarray-class) >>> d=np.ma.array([], mask=np.ma.nomask) >>> d masked_array(data = [], mask = False, fill_value = 1e+20) I suspect the reason is that mask defaults to np.ma.nomask and the rationale for that decision was performance. What follows is that masked array with the default nomask attribute behaves a regular array (hence the nan), having a placeholder for mask to be set later, if needed. That tripped me recently, I had Cython code which relied on shapes of data and mask parts being equal. George On 12/30/2014 11:17 PM, numpy-discussion-requ...@scipy.org wrote: > Message: 1 > Date: Tue, 30 Dec 2014 16:04:36 -0500 > From: Benjamin Root <ben.r...@ou.edu> > Subject: Re: [Numpy-discussion] Clarifications in numpy.ma module > To: Discussion of Numerical Python <numpy-discussion@scipy.org> > Message-ID: > <CANNq6Fk4XTMcXeb64C9FWWnjWsVVK=Ri7CsGLsbE2wr=z-r...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Tue, Dec 30, 2014 at 3:29 PM, Alexander Belopolsky <ndar...@mac.com> > wrote: > >> On Tue, Dec 30, 2014 at 2:49 PM, Benjamin Root <ben.r...@ou.edu> wrote: >> >>> Where does it say that operations on masked arrays should not produce >>> NaNs? >> >> Masked arrays were invented with the specific goal to avoid carrying NaNs >> in computations. Back in the days, NaNs were not available on some >> platforms and had significant performance issues on others. These days NaN >> support for floating point types is nearly universal, but numpy types are >> not limited by floating point. >> >> > >From the numpy.ma docstring: > "Arrays sometimes contain invalid or missing data. When doing operations > on such arrays, we wish to suppress invalid values, which is the > purpose masked > arrays fulfill (an example of typical use is given below)." > > A few lines down: > "Here, we construct a masked array that suppress all ``NaN`` values. We > may now proceed to calculate the mean of the other values" > > Note the repeated usage of the term "suppress" in the context of the input > arrays. The phrase "We may now proceed to calculate the mean of the other > values" implies that the mean of a masked array is taken to be the mean of > everything but the masked values. If there are no values remaining, then I > expect it to give me the equivalent of np.mean([]). > > > >>> Having np.mean([]) return the same thing as np.ma.mean([]) makes >> complete sense. >> >> Does the following make sense as well? >> >>>>> import numpy >>>>> numpy.ma.masked_values([0, 0], 0).mean() >> masked >>>>> numpy.ma.masked_values([0], 0).mean() >> masked >>>>> numpy.ma.masked_values([], 0).mean() >> * Two warnings * >> masked_array(data = nan, >> mask = False, >> fill_value = 0.0) >> >> > No, I would consider the first two to be bugs. And actually, returning a > masked array in the third one is also incorrect in this case. The result > should be a scalar. This is now veering to the same issues discussed in the > np.nanmean([]) vs. np.nanmean([np.nan]) discussion. > > Cheers! > Ben Root > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion