Re: [GENERAL] optimizing a cpu-heavy query

2011-05-19 Thread Tom Lane
Hitoshi Harada writes: > 2011/4/27 Tom Lane : >> Joel Reymont writes: >>> That's a 30x speedup, from 12 minutes down to 38s. Thanks Tom! >> Huh, I would've bet on a lot more actually.  The nodeFunctionscan and >> nodeAgg code must not be as inefficient as it looks on the surface ... > Did you m

Re: [GENERAL] optimizing a cpu-heavy query

2011-05-19 Thread Hitoshi Harada
2011/4/27 Tom Lane : > Joel Reymont writes: >> On Apr 26, 2011, at 5:00 PM, Tom Lane wrote: >>> For another couple orders of magnitude, convert the sub-function to C code. >>>  (I don't think you need >>> a whole data type, just a function that does the scalar product.) > >> That's a 30x speedup,

Re: [GENERAL] optimizing a cpu-heavy query

2011-04-27 Thread Tom Lane
Joel Reymont writes: > On Apr 26, 2011, at 5:00 PM, Tom Lane wrote: >> For another couple orders of magnitude, convert the sub-function to C code. >> (I don't think you need >> a whole data type, just a function that does the scalar product.) > That's a 30x speedup, from 12 minutes down to 38s.

Re: [GENERAL] optimizing a cpu-heavy query

2011-04-27 Thread Joel Reymont
Tom, On Apr 26, 2011, at 5:00 PM, Tom Lane wrote: > For another couple orders of magnitude, convert the sub-function to C code. > (I don't think you need > a whole data type, just a function that does the scalar product.) That's a 30x speedup, from 12 minutes down to 38s. Thanks Tom!

Re: [GENERAL] optimizing a cpu-heavy query

2011-04-26 Thread Tom Lane
Joel Reymont writes: > I'm trying to optimize the following query that performs KL Divergence [1]. > As you can see the distance function operates on vectors of 150 floats. > CREATE OR REPLACE FUNCTION docs_within_distance(vec topics, threshold float) > RETURNS TABLE(id doc_id, distance float)

[GENERAL] optimizing a cpu-heavy query

2011-04-26 Thread Joel Reymont
Folks, I'm trying to optimize the following query that performs KL Divergence [1]. As you can see the distance function operates on vectors of 150 floats. The query takes 12 minutes to run on an idle (apart from pgsql) EC2 m1 large instance with 2 million documents in the docs table. The CPU i