Re: [Numpy-discussion] Distance Matrix speed

2006-06-20 Thread Alan G Isaac
I think the distance matrix version below is about as good as it gets with these basic strategies. fwiw, Alan Isaac def dist(A,B): rowsA, rowsB = A.shape[0], B.shape[0] distanceAB = empty( [rowsA,rowsB] , dtype=float) if rowsA <= rowsB: temp = empty_like(B) for i in r

Re: [Numpy-discussion] Distance Matrix speed

2006-06-19 Thread Tim Hochberg
Tim Hochberg wrote: >Sebastian Beca wrote: > > > >>I just ran Alan's script and I don't get consistent results for 100 >>repetitions. I boosted it to 1000, and ran it several times. The >>faster one varied alot, but both came into a ~ +-1.5% difference. >> >>When it comes to scaling, for my pro

Re: [Numpy-discussion] Distance Matrix speed

2006-06-19 Thread Tim Hochberg
Sebastian Beca wrote: >I just ran Alan's script and I don't get consistent results for 100 >repetitions. I boosted it to 1000, and ran it several times. The >faster one varied alot, but both came into a ~ +-1.5% difference. > >When it comes to scaling, for my problem(fuzzy clustering), N is the >

Re: [Numpy-discussion] Distance Matrix speed

2006-06-19 Thread Sebastian Beca
I just ran Alan's script and I don't get consistent results for 100 repetitions. I boosted it to 1000, and ran it several times. The faster one varied alot, but both came into a ~ +-1.5% difference. When it comes to scaling, for my problem(fuzzy clustering), N is the size of the dataset, which sh

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Alan G Isaac
On Sun, 18 Jun 2006, Tim Hochberg apparently wrote: > Alan G Isaac wrote: >> On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: >>> def dist(): >>> d = zeros([N, C], dtype=float) >>> if N < C: for i in range(N): >>> xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) >>> return d >>> else

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Tim Hochberg
Alan G Isaac wrote: >On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: > > >>def dist(): >>d = zeros([N, C], dtype=float) >>if N < C: for i in range(N): >> xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) >> return d >>else: >> for j in range(C): >> xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axi

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Alan G Isaac
On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: > def dist(): > d = zeros([N, C], dtype=float) > if N < C: for i in range(N): > xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) > return d > else: > for j in range(C): > xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) > return d But that

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Sebastian Beca
I checked the matlab version's code and it does the same as discussed here. The only thing to check is to make sure you loop around the shorter dimension of the output array. Speedwise the Matlab code still runs about twice as fast for large sets of data (by just taking time by hand and comparing)

Re: [Numpy-discussion] Distance Matrix speed

2006-06-17 Thread Robert Kern
Alex Cannon wrote: > How about this? > > def d5(): > return add.outer(sum(A*A, axis=1), sum(B*B, axis=1)) - \ > 2.*dot(A, transpose(B)) You might lose some precision with that approach, so the OP should compare results and timings to look at the tradeoffs. -- Rob

Re: [Numpy-discussion] Distance Matrix speed

2006-06-17 Thread Alex Cannon
How about this? def d5(): return add.outer(sum(A*A, axis=1), sum(B*B, axis=1)) - \ 2.*dot(A, transpose(B)) ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/list

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Johannes Loehnert
Hi, > def d4(): > d = zeros([4, 1000], dtype=float) > for i in range(4): > xy = A[i] - B > d[i] = sqrt( sum(xy**2, axis=1) ) > return d > > Maybe there's another alternative to d4? > Thanks again, I think this is the fastest you can get. Maybe it would be nicer to use

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Sebastian Beca
Please replace: C = 4 N = 1000 > d = zeros([C, N], dtype=float) BK. ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Sebastian Beca
Thanks! Avoiding the inner loop is MUCH faster (~20-300 times than the original). Nevertheless I don't think I can use hypot as it only works for two dimensions. The general problem I have is: A = random( [C, K] ) B = random( [N, K] ) C ~ 1-10 N ~ Large (thousands, millions.. i.e. my dataset) K ~

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Tim Hochberg
Christopher Barker wrote: >Bruce Southey wrote: > > >>Please run the exact same code in Matlab that you are running in >>NumPy. Many of Matlab functions are very highly optimized so these are >>provided as binary functions. I think that you are running into this >>so you are not doing the correc

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Christopher Barker
Bruce Southey wrote: > Please run the exact same code in Matlab that you are running in > NumPy. Many of Matlab functions are very highly optimized so these are > provided as binary functions. I think that you are running into this > so you are not doing the correct comparison He is doing the cor

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Bruce Southey
Hi, Please run the exact same code in Matlab that you are running in NumPy. Many of Matlab functions are very highly optimized so these are provided as binary functions. I think that you are running into this so you are not doing the correct comparison So the ways around it are to write an extensi

Re: [Numpy-discussion] distance matrix speed

2006-06-16 Thread Tim Hochberg
Sebastian Beca wrote: >Hi, >I'm working with NumPy/SciPy on some algorithms and i've run into some >important speed differences wrt Matlab 7. I've narrowed the main speed >problem down to the operation of finding the euclidean distance >between two matrices that share one dimension rank (dist in M

[Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Sebastian Beca
Hi, I'm working with NumPy/SciPy on some algorithms and i've run into some important speed differences wrt Matlab 7. I've narrowed the main speed problem down to the operation of finding the euclidean distance between two matrices that share one dimension rank (dist in Matlab): Python: def dtest()

Re: [Numpy-discussion] distance matrix speed

2006-06-16 Thread David Douard
Hi, On Fri, Jun 16, 2006 at 08:28:18AM +0200, Johannes Loehnert wrote: > Hi, > > def dtest(): >     A = random( [4,2]) >     B = random( [1000,2]) > > # drawback: memory usage temporarily doubled > # solution see below > d = A[:, newaxis, :] - B[newaxis, :, :] Unless I'm wrong, one

Re: [Numpy-discussion] distance matrix speed

2006-06-15 Thread Johannes Loehnert
Hi, def dtest():     A = random( [4,2])     B = random( [1000,2]) # drawback: memory usage temporarily doubled # solution see below d = A[:, newaxis, :] - B[newaxis, :, :] # written as 3 expressions for more clarity d = sqrt((d**2).sum(axis=2)) return d def dtest_lowmem(

Re: [Numpy-discussion] distance matrix speed

2006-06-15 Thread Michael Sorich
Hi Sebastian, I am not sure if there is a function already defined in numpy, but something like this may be what you are after def distance(a1, a2): return sqrt(sum((a1[:,newaxis,:] - a2[newaxis,:,:])**2, axis=2)) The general idea is to avoid loops if you want the code to execute fast. I hop

[Numpy-discussion] distance matrix speed

2006-06-15 Thread Sebastian Beca
Hi, I'm working with NumPy/SciPy on some algorithms and i've run into some important speed differences wrt Matlab 7. I've narrowed the main speed problem down to the operation of finding the euclidean distance between two matrices that share one dimension rank (dist in Matlab): Python: def dtest()