On 03/03/2013 09:24 PM, DoanVietTrungAtGmail wrote:
Dear tutors
I am checking out csv as a possible data structure for my records. In each
record, some fields are an integer and some are a list of integers of
variable length. I use csv.DictWriter to write data. When reading out using
csv.DictReader, each row is read as a string, per the csv module's standard
behaviour. To get these columns as lists of integers, I can think of only a
multi-step process: first, remove the brackets enclosing the string;
second, split the string into a list containing substrings; third, convert
each substring into an integer. This process seems inelegant. Is there a
better way to get integers and lists of integers from a csv file?
Or, is a csv file simply not the best data structure given the above
requirement?
Your terminology is very confusing. A csv is not a data structure, it's
a method of serializing lists of strings. Or in this case dicts of
strings. If a particular dict value isn't a string, it'll get converted
to one implicitly. csv does not handle variable length records, so this
is close to the best you're going to do.
Apart from csv, I considered using a dict or list, or using an
object to represent each row.
Objects don't exist in a file, so they don't persist between multiple
runs of the program. Likewise dict and list. So no idea what you
really meant.
I am being attracted to csv because csv means
serialisation is unnecessary, I just need to close and open the file to
stop and continue later (it's a simulation experiment).
Closing and opening don't do anything to persist data, but we can guess
you must have meant to imply reading and writing as well. And you've
nicely finessed the serialization in the write step, but as you
discovered, you'll have to handle the deserialization to get back to
ints and list.
Also, I am guessing
but haven't checked, csv is more space efficient.
More space efficient than what?
Each row contains a few
integers plus a few lists containing hundreds of integers, and there will
be up to hundreds of millions of rows.
CODE: My Python 2.7 code is below. It doesn't have the third step
(substring -> int).
import csv
record1 = {'id':1, 'type':1, 'level':1, 'ListInRecord':[2, 9]}
record2 = {'id':2, 'type':1, 'level':1, 'ListInRecord':[1, 9]}
record3 = {'id':3, 'type':2, 'level':1, 'ListInRecord':[2]}
record9 = {'id':9, 'type':3, 'level':0, 'ListInRecord':[]}
rows = [record1, record2, record3, record9]
header = ['id', 'type', 'level', 'ListInRecord']
with open('testCSV.csv', 'wb') as f:
fCSV = csv.DictWriter(f, header)
fCSV.writeheader()
fCSV.writerows(rows)
with open('testCSV.csv', 'r') as f:
fCSV = csv.DictReader(f)
for row in fCSV:
I'd add the deserialization here. For each item in row, if the
value begins and ends with [ ] then make it into a list, and if a digit
or minus-sign, make it into an int. Then for the lists, convert each
element to an int. You can use Don Jennings suggestion to save a lost
of effort here.
This should reconstruct the original recordn precisely. But it'll take
some testing to be sure.
print 'ID=', row['id'],'ListInRecord=',
row['ListInRecord'][1:-1].split(', ') # I want this to be a list of
integers, NOT list of strings
OUTPUT:
ID= 1 ListInRecord= ['2', '9']
ID= 2 ListInRecord= ['1', '9']
ID= 3 ListInRecord= ['2']
ID= 9 ListInRecord= ['']
--
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor