Yet another example of an unexpected behaviour:

 >>>[], mask=0)
 >>> a.mean()
 >>> b.mean()


masked_array(data = [],
              mask = [],
        fill_value = 1e+20)
 >>> b
masked_array(data = [],
              mask = False,
        fill_value = 1e+20)

After some googling I found on Stack Overflow
  (this is not clearly explained on the numpy doc 

 >>> d
masked_array(data = [],
              mask = False,
        fill_value = 1e+20)

I suspect the reason is that mask defaults to and the 
rationale for that decision was performance. What follows is that masked 
array with the default nomask attribute behaves a regular array (hence 
the nan), having  a placeholder for mask to be set later, if needed. 
That tripped me recently, I had Cython code which relied on shapes of 
data and mask parts being equal.


On 12/30/2014 11:17 PM, wrote:
> Message: 1
> Date: Tue, 30 Dec 2014 16:04:36 -0500
> From: Benjamin Root <>
> Subject: Re: [Numpy-discussion] Clarifications in module
> To: Discussion of Numerical Python <>
> Message-ID:
>       <>
> Content-Type: text/plain; charset="utf-8"
> On Tue, Dec 30, 2014 at 3:29 PM, Alexander Belopolsky <>
> wrote:
>> On Tue, Dec 30, 2014 at 2:49 PM, Benjamin Root <> wrote:
>>> Where does it say that operations on masked arrays should not produce
>>> NaNs?
>> Masked arrays were invented with the specific goal to avoid carrying NaNs
>> in computations.  Back in the days, NaNs were not available on some
>> platforms and had significant performance issues on others.  These days NaN
>> support for floating point types is nearly universal, but numpy types are
>> not limited by floating point.
> >From the docstring:
> "Arrays sometimes contain invalid or missing data.  When doing operations
>      on such arrays, we wish to suppress invalid values, which is the
> purpose masked
>      arrays fulfill (an example of typical use is given below)."
> A few lines down:
> "Here, we construct a masked array that suppress all ``NaN`` values.  We
>      may now proceed to calculate the mean of the other values"
> Note the repeated usage of the term "suppress" in the context of the input
> arrays. The phrase "We may now proceed to calculate the mean of the other
> values" implies that the mean of a masked array is taken to be the mean of
> everything but the masked values. If there are no values remaining, then I
> expect it to give me the equivalent of np.mean([]).
>>> Having np.mean([]) return the same thing as[]) makes
>> complete sense.
>> Does the following make sense as well?
>>>>> import numpy
>>>>>[0, 0], 0).mean()
>> masked
>>>>>[0], 0).mean()
>> masked
>>>>>[], 0).mean()
>> * Two warnings *
>> masked_array(data = nan,
>>               mask = False,
>>         fill_value = 0.0)
> No, I would consider the first two to be bugs. And actually, returning a
> masked array in the third one is also incorrect in this case. The result
> should be a scalar. This is now veering to the same issues discussed in the
> np.nanmean([]) vs. np.nanmean([np.nan]) discussion.
> Cheers!
> Ben Root

NumPy-Discussion mailing list

Reply via email to