Anshu Kumar wrote: > Hello All, > > So much Thanks for your response. > > Here is my actual scenario. I have a csv file and it would already be > present. I need to read and remove some rows based on some logic. I have > written earlier two separate file opens which I think was nice and clean. > > actual code: > > with open(file_path, 'rb') as fr: > for row in csv.DictReader(fr): > #Skip for those segments which are part of overridden_ids > if row['id'] not in overriden_ids:
Oops typo; so probably not your actual code :( > segments[row['id']] = { > 'id': row['id'], > 'attrib': json.loads(row['attrib']), > 'stl': json.loads(row['stl']), > 'meta': json.loads(row['meta']), > } > #rewriting files with deduplicated segments > with open(file_path, 'wb') as fw: > writer = csv.UnicodeWriter(fw) > writer.writerow(["id", "attrib", "stl", "meta"]) > for seg in segments.itervalues(): > writer.writerow([seg['id'], json.dumps(seg["attrib"]), > json.dumps(seg["stl"]), json.dumps(seg["meta"])]) > > > I have got review comments to improve this block by having just single > file open and minimum memory usage. Are the duplicate ids stored in overridden_ids or are they implicitly removed by overwriting them in segments[row["id"]] = ... ? If the latter, does it matter whether the last or the first row with a given id is kept? _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor