Hi Group, I have a file which is 2.5 Gb., TRIM54 NM_187841.1 GO:0004984 TRIM54 NM_187841.1 GO:0001584 TRIM54 NM_187841.1 GO:0003674 TRIM54 NM_187841.1 GO:0004985 TRIM54 NM_187841.1 GO:0001584 TRIM54 NM_187841.1 GO:0001653 TRIM54 NM_187841.1 GO:0004984
There are many duplicate lines. I wanted to get rid of the duplicates. I chose to parse to get uniqe element. f1 = open('mfile','r') da = f1.read().split('\n') dat = da[:-1] f2 = open('res','w') dset = Set(dat) for i in dset: f2.write(i) f2.write('\n') f2.close() Problem: Python says it cannot hande such a large file. Any ideas please help me. cheers srini __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor