Re: [Numpy-discussion] 2D binning

2010-06-02 Thread Stephen Simmons
On 1/06/2010 10:51 PM, Wes McKinney wrote: > > This is a pretty good example of the "group-by" problem that will > hopefully work its way into a future edition of NumPy. Wes (or anyone else), please can you elaborate on any plans for groupby? I've made my own modification to numpy.bincount f

Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-14 Thread Stephen Simmons
the output needs to be; and (ii) allows you to control the size of the output array, as you may want it bigger than the number of bins would suggest. I look forward to the draft NEP! Best regards Stephen Simmons On 13/04/2010 10:34 PM, Robert Kern wrote: > On Sat, Apr 10, 2010 at 17

Re: [Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Stephen Simmons
cs.mit.edu/projects/cstore/vldb.pdf). Stephen Francesc Alted wrote: > A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué: > >> - Pytables (HDF using chunked storage for recarrays with LZO >> compression and shuffle filter) >> - can't extract individual field

[Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Stephen Simmons
Hi, Is anyone working on alternative storage options for numpy arrays, and specifically recarrays? My main application involves processing series of large recarrays (say 1000 recarrays, each with 5M rows having 50 fields). Existing options meet some but not all of my requirements. Requirement

Re: [Numpy-discussion] Home for pyhdf5io?

2009-05-24 Thread Stephen Simmons
David Warde-Farley wrote: > On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote: >> Actually my vision with pyhdf5io is to have hdf5 to replace numpy's >> own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) >> should be the standard (binary) way to store data in scipy/numpy. A >>

Re: [Numpy-discussion] How to convert a list into a structured array?

2009-05-05 Thread Stephen Simmons
Wei Su wrote: Hi, Francesc: Thanks a lot for offering me help. My code is really simple as of now. ** from pyodbc import * from rpy import * cnxn = connect(/'DRIVER={SQL Server};SERVER=srdata01\\sql2k5;DATAB

[Numpy-discussion] Example code for Numpy C preprocessor 'repeat' directive?

2009-03-04 Thread Stephen Simmons
Hi, Please can someone suggest resources for learning how to use the 'repeat' macros in numpy C code to avoid repeating sections of type-specific code for each data type? Ideally there would be two types of resources: (i) a description of how the repeat macros are meant to be used/compiled; an

[Numpy-discussion] Easy way to vectorize a loop?

2009-03-01 Thread Stephen Simmons
Hi, Can anyone help me out with a simple way to vectorize this loop? # idx and vals are arrays with indexes and values used to update array data # data = numpy.ndarray(shape=(100,100,100,100), dtype='f4') flattened = data.ravel() for i in range(len(vals)): flattened[idx[i]]+=vals[i] Many th

Re: [Numpy-discussion] ANN: HDF5 for Python 1.1

2009-02-09 Thread Stephen Simmons
Hi Andrew, Do you have any plans to support LZO compression in h5py? I have lots of LZO-compressed datasets created with PyTables. There's a real barrier to using both h5py and PyTables if the fast decompressor options are just LZF on h5py and LZO on PyTables. Many thanks Stephen Andrew Colle

Re: [Numpy-discussion] ANN: HDF5 for Python 1.0

2008-12-02 Thread Stephen Simmons
Do you have any plans to add lzo compression support, in addition to gzip? This is a feature I used a lot in PyTables. Andrew Collette wrote: > = > Announcing HDF5 for Python (h5py) 1.0 > = > > What is h5py? > - >

[Numpy-discussion] Anyone written an SQL-like interface to numpy/PyTables?

2007-06-18 Thread Stephen Simmons
Hi, Has anyone written a parser for SQL-like queries against PyTables HDF tables or numpy recarrays? I'm asking because I have written code for grouping then summing rows of source data, where the groups are defined by functions of the source data, or looking up a related field in a separate l

[Numpy-discussion] 3-10x speedup in bincount()

2007-03-13 Thread Stephen Simmons
ting strings to integers * * Author: Stephen Simmons, [EMAIL PROTECTED] * Date:11 March 2007 * * This module contains C code for functions I am using to accelerate * SQL-like aggregate functions for a column-oriented database based on numpy. * * subtotal's bincount is typically 3-10 times

[Numpy-discussion] Feedback pls on proposed changes to bincount()

2007-03-11 Thread Stephen Simmons
Hi, I'd like to propose some minor modifications to the function bincount(arr, weights=None), so would like some feedback from other uses of bincount() before I write this up as a proper patch, . Background: bincount() has two forms: - bincount(x) returns an integer array ians of length max(x)+

Re: [Numpy-discussion] array.sum() slower than expected along some array axes?

2007-02-03 Thread Stephen Simmons
Charles R Harris wrote: > > > On 2/3/07, *Stephen Simmons* <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hi, > > Does anyone know why there is an order of magnitude difference > in the speed of numpy's array.sum() function depe

[Numpy-discussion] array.sum() slower than expected along some array axes?

2007-02-03 Thread Stephen Simmons
Hi, Does anyone know why there is an order of magnitude difference in the speed of numpy's array.sum() function depending on the axis of the matrix summed? To see this, import numpy and create a big array with two rows:    >>> import numpy    >>> a = numpy.ones([2,100], 'f4') Then using

Re: [Numpy-discussion] Advice please on efficient subtotal function

2006-12-29 Thread Stephen Simmons
heers, and thanks for any further suggestions, Stephen Francesc Altet <[EMAIL PROTECTED]> wrote: > A Divendres 29 Desembre 2006 10:05, Stephen Simmons escrigué: > > Hi, > > > > I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. > > This

[Numpy-discussion] Advice please on efficient subtotal function

2006-12-29 Thread Stephen Simmons
Hi, I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. This is more easily explained in code that words, thus: for n in xrange(len(data)): totals[ i[n], j[n] ] += data[n] data comes from a series of PyTables files with ~200m rows. Each row has ~20 cols, and I use the fir