On 03/03/2013 09:24 PM, DoanVietTrungAtGmail wrote:
Dear tutors

I am checking out csv as a possible data structure for my records. In each
record, some fields are an integer and some are a list of integers of
variable length. I use csv.DictWriter to write data. When reading out using
csv.DictReader, each row is read as a string, per the csv module's standard
behaviour. To get these columns as lists of integers, I can think of only a
multi-step process: first, remove the brackets enclosing the string;
second, split the string into a list containing substrings; third, convert
  each substring into an integer. This process seems inelegant. Is there a
better way to get integers and lists of integers from a csv file?

Or, is a csv file simply not the best data structure given the above
requirement?

Your terminology is very confusing. A csv is not a data structure, it's a method of serializing lists of strings. Or in this case dicts of strings. If a particular dict value isn't a string, it'll get converted to one implicitly. csv does not handle variable length records, so this is close to the best you're going to do.

 Apart from csv, I considered using a dict or list, or using an
object to represent each row.

Objects don't exist in a file, so they don't persist between multiple runs of the program. Likewise dict and list. So no idea what you really meant.

 I am being attracted to csv because csv means
serialisation is unnecessary, I just need to close and open the file to
stop and continue later (it's a simulation experiment).

Closing and opening don't do anything to persist data, but we can guess you must have meant to imply reading and writing as well. And you've nicely finessed the serialization in the write step, but as you discovered, you'll have to handle the deserialization to get back to ints and list.

 Also, I am guessing
but haven't checked, csv is more space efficient.

More space efficient than what?

 Each row contains a few
integers plus a few lists containing hundreds of integers, and there will
be up to hundreds of millions of rows.

CODE: My Python 2.7 code is below. It doesn't have the third step
(substring -> int).

import csv

record1 = {'id':1, 'type':1, 'level':1, 'ListInRecord':[2, 9]}
record2 = {'id':2, 'type':1, 'level':1, 'ListInRecord':[1, 9]}
record3 = {'id':3, 'type':2, 'level':1, 'ListInRecord':[2]}
record9 = {'id':9, 'type':3, 'level':0, 'ListInRecord':[]}
rows = [record1, record2, record3, record9]
header = ['id', 'type', 'level', 'ListInRecord']

with open('testCSV.csv', 'wb') as f:
     fCSV = csv.DictWriter(f, header)
     fCSV.writeheader()
     fCSV.writerows(rows)

with open('testCSV.csv', 'r') as f:
     fCSV = csv.DictReader(f)
     for row in fCSV:

I'd add the deserialization here. For each item in row, if the value begins and ends with [ ] then make it into a list, and if a digit or minus-sign, make it into an int. Then for the lists, convert each element to an int. You can use Don Jennings suggestion to save a lost of effort here.

This should reconstruct the original recordn precisely. But it'll take some testing to be sure.

         print 'ID=', row['id'],'ListInRecord=',
row['ListInRecord'][1:-1].split(', ') # I want this to be a list of
integers, NOT list of strings

OUTPUT:

ID= 1 ListInRecord= ['2', '9']
ID= 2 ListInRecord= ['1', '9']
ID= 3 ListInRecord= ['2']
ID= 9 ListInRecord= ['']



--
DaveA
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to