[Tutor] need to get unique elements out of a 2.5Gb file

Srinivas Iyyer Wed, 01 Feb 2006 22:00:09 -0800

Hi Group,

I have a file which is 2.5 Gb.,
 
TRIM54  NM_187841.1     GO:0004984
TRIM54  NM_187841.1     GO:0001584
TRIM54  NM_187841.1     GO:0003674
TRIM54  NM_187841.1     GO:0004985
TRIM54  NM_187841.1     GO:0001584
TRIM54  NM_187841.1     GO:0001653
TRIM54  NM_187841.1     GO:0004984


There are many duplicate lines.  I wanted to get rid
of the duplicates.

I chose to parse to get uniqe element.

f1 = open('mfile','r')
da = f1.read().split('\n')
dat = da[:-1]
f2 = open('res','w')
dset = Set(dat)
for i in dset:
    f2.write(i)
    f2.write('\n')
f2.close()

Problem: Python says it cannot hande such a large
file. 
Any ideas please help me.

cheers
srini

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] need to get unique elements out of a 2.5Gb file

Reply via email to