Hi, thank you everybody for your help. The classy solution posted by nha works great!
Regards Jasmin 2009/7/3 nha <lyondi...@free.fr> > Hello, > > Le 2/07/09 23:21, Greg Stark a écrit : > > On Thu, Jul 2, 2009 at 3:48 PM, Jasmin >> Dizdarevic<jasmin.dizdare...@gmail.com> wrote: >> >>> customer ; seg >>> 111 ; L1 >>> 111 ; L2 >>> 111 ; L1 >>> 222 ; L3 >>> 222 ; L3 >>> 222 ; L2 >>> >>> the result should look like this: >>> >>> 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1 >>> 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1 >>> >> >> Are you just looking for the most frequent seg for each customer? >> >> select distinct on (customer) customer,seg >> from (select customer, seg, count(*) as n from tab group by seg) >> order by customer, n desc >> >> That doesn't give the ratios though. >> >> > Some errors would occur with the above query from within the subquery > because of a projection on customer column that is not a grouped column > (neither an uniquely identified column by seg--seg is assumed not to be a > primary key here according to the given examples of value). By the way, the > ordered column n is not a member of the projected columns. > > Nevertheless, in the same direction as Greg Stark, the following query > would approach the target result: > > SELECT T5A.customer, T5A.seg > FROM ( > SELECT T2.customer, T2.seg, COUNT(*) AS nb > FROM cst T2 > GROUP BY T2.customer, T2.seg > ) T5A INNER JOIN ( > SELECT T4.customer, MAX(T4.nb) AS maxNb > FROM ( > SELECT T2B.customer, T2B.seg, COUNT(*) AS nb > FROM cst T2B > GROUP BY T2B.customer, T2B.seg > ) T4 > GROUP BY T4.customer > ) T5B > ON T5A.customer = T5B.customer > AND T5A.nb = T5B.maxNb > > where 'cst' is the reference table including 'customer' and 'seg' columns. > This query considers a join between two tables: > - the first table, aliased T5A, counts the multiplicity for each couple of > 'customer' and 'seg'; > - the secund table, aliased T5B, reveals the highest multiplicity for each > 'customer' with regard to each 'seg' attached to customer. > Then the join only retains the couples of 'customer' and 'seg' whom > multiplicity equals the higher for the 'customer' of the current couple. > > Each multiplicity may be easily added to the resulting records by spanning > the projected columns with T5A.nb column. A sort of ratio may also be added; > one aggregated column may first be added to table T4 as SUM(T4.nb) -- that > is, sum of multiplicity for each customer; then the resulting join may > compute the ratio between the highest multiplicity and the sum of > multiplicity for each customer, in addition to the seg (or list of seg) for > which the multiplicity is the highest. > > Here is the modified query: > > SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb > FROM ( > SELECT T2.customer, T2.seg, COUNT(*) AS nb > FROM cst T2 > GROUP BY T2.customer, T2.seg > ) T5A INNER JOIN ( > SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNb > FROM ( > SELECT T2B.customer, T2B.seg, COUNT(*) AS nb > FROM cst T2B > GROUP BY T2B.customer, T2B.seg > ) T4 > GROUP BY T4.customer > ) T5B > ON T5A.customer = T5B.customer > AND T5A.nb = T5B.maxNb > > As is, this query may result to multiple couples of customer and seg with > the same customer value when many seg relie as many times as many others for > the current customer. This "edge effect" may be avoided in many ways > depending on the original purpose. > > Regards. > > -- > nha / Lyon / France. >