Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote: The use of C long affects all the C and Pyrex source code in mtrand module, not just mtrand.pyx. All of it is fubar on Win64. randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 09:21, Sturla Molden wrote: randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for the binomial distribution. mtrand.pyx correctly handles this, but it

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 06:32, Sturla Molden wrote: Den 24.01.2012 06:00, skrev Sturla Molden: Both i and length could overflow here. It should overflow on allocation of more than 2 GB. There is also a lot of C longs in the internal state (line 55-105), as well as the other functions. The use of C

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:37, Sturla Molden stu...@molden.no wrote: On 24.01.2012 09:21, Sturla Molden wrote: randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes are size_t. distributions.c take C longs as parameters e.g. for

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 08:47, Sturla Molden stu...@molden.no wrote: The coding is also inconsistent, compare for example: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180 https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201 I'm sorry,

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:16, Robert Kern wrote: I'm sorry, what are you demonstrating there? Both npy_intp and C long are used for sizes and indexing. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robert Kern
On Tue, Jan 24, 2012 at 09:19, Sturla Molden stu...@molden.no wrote: On 24.01.2012 10:16, Robert Kern wrote: I'm sorry, what are you demonstrating there? Both npy_intp and C long are used for sizes and indexing. Ah, yes. I think Travis added the multiiter code to cont1_array(), which does

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Sturla Molden
On 24.01.2012 10:15, Robert Kern wrote: There are two different uses of long that you need to distinguish. One is for sizes, and one is for parameters and values. The sizes should definitely be upgraded to npy_intp. The latter shouldn't; these should remain as the default integer type of

Re: [Numpy-discussion] Strange error raised by scipy.special.erf

2012-01-24 Thread Pierre Haessig
Le 22/01/2012 11:28, Nadav Horesh a écrit : special.erf(26.5) 1.0 special.erf(26.6) Traceback (most recent call last): File pyshell#7, line 1, in module special.erf(26.6) FloatingPointError: underflow encountered in erf special.erf(26.7) 1.0 I can confirm this same behaviour

[Numpy-discussion] einsum evaluation order

2012-01-24 Thread Søren Gammelmark
Dear all, I was just looking into numpy.einsum and encountered an issue which might be worth pointing out in the documentation. Let us say you wish to evaluate something like this (repeated indices a summed) D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime, sigma] *

[Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Kathleen M Tacina
I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers between 2**63 and 2**64-1. I would have expected np.min_scalar_type(2**64-1) to return uint64. Instead, I get object. Further experimenting showed that the largest

Re: [Numpy-discussion] Strange error raised by scipy.special.erf

2012-01-24 Thread Nadav Horesh
I filed a ticket (#1590). Thank you for the verification. Nadav. From: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] On Behalf Of Pierre Haessig [pierre.haes...@crans.org] Sent: 24 January 2012 16:01 To:

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 09:15:01AM +, Robert Kern wrote: On Tue, Jan 24, 2012 at 08:37, Sturla Molden stu...@molden.no wrote: On 24.01.2012 09:21, Sturla Molden wrote: randomkit.c handles C long correctly, I think. There are different codes for 32 and 64 bit C long, and buffer sizes

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: Den 23.01.2012 22:08, skrev Christoph Gohlke: Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array indices and sizes are C long instead of npy_intp, for example in the randint

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Robin
On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley warde...@iro.umontreal.ca wrote: On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote: Den 23.01.2012 22:08, skrev Christoph Gohlke: Maybe this explains the win-amd64 behavior: There are a couple of places in mtrand where array

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have some time, if someone more familiar with the indexing code doesn't beat me

[Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000,

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Bruce Southey
On 01/24/2012 12:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max()

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I have confirmed this on a 64-bit linux machine running python 2.7.2 with the development version of numpy. It seems to be related to using float32 instead of float64. If the array is first converted to a 64-bit float (via astype), mean gives an answer that agrees with your looped-calculation

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]:

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Val Kalatsky
Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the bad mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Zachary Pincus
You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: a.mean(dtype=np.float32) # default and lacks precision

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread K . -Michael Aye
Thank you Bruce and all, I knew I was doing something wrong (should have read the mean method doc more closely). Am of course glad that's so easy understandable. But: If the error can get so big, wouldn't it be a better idea for the accumulator to always be of type 'float64' and then convert

[Numpy-discussion] Course Python for Scientists and Engineers in Chicago

2012-01-24 Thread Mike Müller
Course Python for Scientists and Engineers in Chicago === There will be a comprehensive Python course for scientists and engineers in Chicago end of February / beginning of March 2012. It consists of a 3-day intro and a 2-day advanced section.

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread David Warde-Farley
On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote: On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote: Yes - I get exactly the same numbers in 64 bit windows with 1.6.1. Alright, so that rules out platform specific effects. I'll try and hunt the bug down when I have

Re: [Numpy-discussion] advanced indexing bug with huge arrays?

2012-01-24 Thread Samuel John
On 23.01.2012, at 11:23, David Warde-Farley wrote: a = numpy.array(numpy.random.randint(256,size=(500,972)),dtype='uint8') b = numpy.random.randint(500,size=(4993210,)) c = a[b] In [14]: c[100:].sum() Out[14]: 0 Same here. Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM,

Re: [Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Samuel John
I get the same results as you, Kathy. *surprised* (On OS X (Lion), 64 bit, numpy 2.0.0.dev-55472ca, Python 2.7.2. On 24.01.2012, at 16:29, Kathleen M Tacina wrote: I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers

Re: [Numpy-discussion] 'Advanced' save and restore operation

2012-01-24 Thread Samuel John
I know you wrote that you want TEXT files, but never-the-less, I'd like to point to http://code.google.com/p/h5py/ . There are viewers for hdf5 and it is stable and widely used. Samuel On 24.01.2012, at 00:26, Emmanuel Mayssat wrote: After having saved data, I need to know/remember the data

Re: [Numpy-discussion] installing matplotlib in MacOs 10.6.8.

2012-01-24 Thread Samuel John
Sorry for the late answer. But at least for the record: If you are using eclipse, I assume you have also installed the eclipse plugin [pydev](http://pydev.org/). Is use it myself, it's good. Then you have to go to the preferences-pydev-PythonInterpreter and select the python version you want

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]:

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Kathleen M Tacina
I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP

Re: [Numpy-discussion] einsum evaluation order

2012-01-24 Thread Mark Wiebe
On Tue, Jan 24, 2012 at 6:32 AM, Søren Gammelmark gammelm...@gmail.comwrote: Dear all, I was just looking into numpy.einsum and encountered an issue which might be worth pointing out in the documentation. Let us say you wish to evaluate something like this (repeated indices a summed)

[Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
I need some help understanding how to loop through many arrays to calculate the 95th percentile. I can easily do this by using numpy.concatenate to make one big array and then finding the 95th percentile using numpy.percentile but this causes a memory error when I want to run this on 100's of

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread eat
Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]:

Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-24 Thread Mark Wiebe
2012/1/21 Ondřej Čertík ondrej.cer...@gmail.com snip Let me know if you figure out something. I think the mask thing is quite slow, but the problem is that it needs to be there, to catch overflows (and it is there in Fortran as well, see the where statement, which does the same thing).

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Marc Shivers
This is probably not the best way to do it, but I think it would work: Your could take two passes through your data, first calculating and storing the median for each file and the number of elements in each file. From those data, you can get a lower bound on the 95th percentile of the combined

Re: [Numpy-discussion] Unexpected behavior with np.min_scalar_type

2012-01-24 Thread Mark Wiebe
On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers between 2**63 and 2**64-1. I would have expected np.min_scalar_type(2**64-1)

Re: [Numpy-discussion] Fix for ticket #1973

2012-01-24 Thread Mark Wiebe
On Mon, Jan 16, 2012 at 8:14 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey bsout...@gmail.comwrote: ** On 01/14/2012 04:31 PM, Charles R Harris

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
thanks for your responses, because of the size of the dataset I will still end up with the memory error if I calculate the median for each file, additionally the files are not all the same size. I believe this memory problem will still arise with the cumulative distribution calculation and not

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Olivier Delalleau
Note that if you are ok with an approximate solution, and you can assume your data is somewhat shuffled, a simple online algorithm that uses no memory consists in: - choosing a small step size delta - initializing your percentile p to a more or less random value (a meaningful guess is better

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Tue, Jan 24, 2012 at 7:21 PM, eat e.antero.ta...@gmail.com wrote: Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a =

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread Charles R Harris
On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]:

Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In