Data aggregation

vedranp Thu, 06 Mar 2008 08:31:18 -0800

Hi,

I have a case where I should aggregate data from the CSV file, which
contains data in this way:


DATE    TIME    COUNTRY ZIP     CITY    VALUE1  VALUE2  VALUE3
21.2.2008       00:00   A       1000    CITY1   1       2       3
21.2.2008       00:00   A       1000    CITY2   4       5       6
21.2.2008       00:00   A       1000    CITY3   7       8       9
21.2.2008       00:00   A       1000    CITY4   1       2       3
21.2.2008       00:15   A       1000    CITY1   4       5       6
21.2.2008       00:15   A       1000    CITY2   7       8       9
21.2.2008       00:15   A       1000    CITY3   1       2       3
21.2.2008       00:15   A       1000    CITY4   4       5       6
21.2.2008       00:00   A       2000    CITY10  7       8       9
21.2.2008       00:00   A       2000    CITY20  1       2       3
21.2.2008       00:00   A       2000    CITY30  4       5       6
21.2.2008       00:00   A       2000    CITY40  1       2       3
21.2.2008       00:15   A       2000    CITY10  7       8       9
21.2.2008       00:15   A       2000    CITY20  1       2       3
21.2.2008       00:15   A       2000    CITY30  4       5       6
21.2.2008       00:15   A       2000    CITY40  1       2       3

I need to aggregate data from file1, so the result would be a CSV file
(file2) in this format:

DATE    COUNTRY ZIP     CITY    SumOfVALUE1     SumOfVALUE2     SumOfVALUE3     
formula1
21.2.2008       A       1000    CITY1   5       7       9       12
21.2.2008       A       1000    CITY2   11      13      15      24
21.2.2008       A       1000    CITY3   8       10      12      18
21.2.2008       A       1000    CITY4   5       7       9       12
21.2.2008       A       2000    CITY10  14      16      18      30
21.2.2008       A       2000    CITY20  2       4       6       6
21.2.2008       A       2000    CITY30  8       10      12      18
21.2.2008       A       2000    CITY40  2       4       6       6

So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some
calculation) the values and do some calculation from summed fields
(e.g.: formula1 = SumOfVALUE1+SumOfVALUE2). I am able to do this by
first loading file1 in SQL, perform a query there, which returns the
file2 results and then load it back in the SQL in the different table.

I would like to avoid the step of taking data out from database in
order to process it. I would like to process the file1  in Python and
load the result (file2) in SQL.

>From some little experience with Perl, I think this is managable with
double hash tables (1: basic hash with key/value = CITY/pointer-to-
other-hash, 2: hash table with values for CITY1), so I assume that
there would be also a way in Python, maybe with dictionaries? Any
ideas?

Regards,
Vedran.
-- 
http://mail.python.org/mailman/listinfo/python-list

Data aggregation

Reply via email to