Re: How to remove subset from a file efficiently?

2006-01-14 Thread Bengt Richter
On 13 Jan 2006 23:17:05 -0800, [EMAIL PROTECTED] wrote: > >fynali wrote: >> $ cat cleanup_ray.py >> #!/usr/bin/python >> import itertools >> >> b = set(file('/home/sajid/python/wip/stc/2/CBR333')) >> >> file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,fi

Re: How to remove subset from a file efficiently?

2006-01-14 Thread Raymond Hettinger
> > b = set(file('/home/sajid/python/wip/stc/2/CBR333')) > > > > file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333'))) > > > > -- > > $ time ./cleanup_ray.py > > > > real0m5.451s > > user0m4.496

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python #import psyco #psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number in open('/home/saj

Re: How to remove subset from a file efficiently?

2006-01-14 Thread bonono
fynali wrote: > Sorry, pls read that ~15 secs. That is more or less about it. As set() is faster than dict(), about 2x on my machine and I assume a portion of your time is in set/dict creation as it is pretty large data set. -- http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
Sorry, pls read that ~15 secs. -- http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python import psyco psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number in open('/home/sajid

Re: How to remove subset from a file efficiently?

2006-01-14 Thread bonono
fynali wrote: > $ cat cleanup_use_psyco_and_list_compr.py > #!/usr/bin/python > > import psyco > psyco.full() > > postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') > outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', > 'w') > > barred = {} > >

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python import psyco psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number in open('/home/sajid

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup.py #!/usr/bin/python postpaid_file = open('/home/oracle/stc/test/PSP333') outfile = open('/home/oracle/stc/test/PSP-CBR.dat', 'w') barred = {} for number in open('/home/oracle/stc/test/CBR333'): barred[number] = None # just add it as a key

Re: How to remove subset from a file efficiently?

2006-01-14 Thread bonono
fynali wrote: > [bonono] > > Have you tried the explicit loop variant with psyco ? > > Sure I wouldn't mind trying; can you suggest some code snippets along > the lines of which I should try...? > > [fynali] > > Needless to say, I'm utterly new to python and my programming > > skills &

Re: How to remove subset from a file efficiently?

2006-01-13 Thread Fredrik Lundh
"fynali" wrote: > Is a rewrite possible of Raymond's or Fredrik's suggestions above which > will still give me the time saving made? Python 2.2 don't have a readymade set type (new in 2.3), and it doesn't support generator expressions (the thing that caused the syntax error). however, using a di

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
[bonono] > Have you tried the explicit loop variant with psyco ? Sure I wouldn't mind trying; can you suggest some code snippets along the lines of which I should try...? [fynali] > Needless to say, I'm utterly new to python and my programming > skills & know-how are rudimentary. (-:

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
-- $ ./cleanup.py Traceback (most recent call last): File "./cleanup.py", line 3, in ? import itertools ImportError: No module named itertools -- $ time ./cleanup.py File "./cleanup.py", line 8 outfile.writelines(number for number in postpaid_fil

Re: How to remove subset from a file efficiently?

2006-01-13 Thread bonono
fynali wrote: > $ cat cleanup_ray.py > #!/usr/bin/python > import itertools > > b = set(file('/home/sajid/python/wip/stc/2/CBR333')) > > file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333'))) > > -- > $

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
$ cat cleanup_ray.py #!/usr/bin/python import itertools b = set(file('/home/sajid/python/wip/stc/2/CBR333')) file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333'))) -- $ time ./cleanup_ray.py

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
$ time fgrep -x -v -f CBR333 PSP333 > PSP-CBR.dat.fgrep real0m31.551s user0m16.841s sys 0m0.912s -- $ time ./cleanup.py real0m6.080s user0m4.836s sys 0m0.408s -- $ wc -l PSP-CBR.dat.fgrep PSP-CBR.dat.python 387242

Re: How to remove subset from a file efficiently?

2006-01-13 Thread AJL
On 12 Jan 2006 22:29:22 -0800 "Raymond Hettinger" <[EMAIL PROTECTED]> wrote: > AJL wrote: > > How fast does this run? > > > > a = set(file('PSP320.dat')) > > b = set(file('CBR319.dat')) > > file('PSP-CBR.dat', 'w').writelines(a.difference(b)) > > Turning PSP into a set takes extra time, c

Re: How to remove subset from a file efficiently?

2006-01-13 Thread Steve Holden
Fredrik Lundh wrote: > Steve Holden wrote: > > >>>looks like premature non-optimization to me... >>> >> >>It might be quicker to establish a dict whose keys are the barred >>numbers and use that, rather than a list, to determine whether the input >>numbers should make it through. > > > what do

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
The code it down to 5 lines! #!/usr/bin/python barred = set(open('/home/sajid/python/wip/CBR319.dat')) postpaid_file = open('/home/sajid/python/wip/PSP320.dat') outfile = open('/home/sajid/python/wip/PSP-CBR.dat', 'w') outfile.writelines(number for number in postpaid

Re: How to remove subset from a file efficiently?

2006-01-13 Thread Christopher Weimann
On 01/12/2006-09:04AM, fynali wrote: > > - PSP320.dat (quite a large list of mobile numbers), > - CBR319.dat (a subset of the above, a list of barred bumbers) > fgrep -x -v -f CBR319.dat PSP320.dat > PSP-CBR.dat -- http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Raymond Hettinger
AJL wrote: > How fast does this run? > > a = set(file('PSP320.dat')) > b = set(file('CBR319.dat')) > file('PSP-CBR.dat', 'w').writelines(a.difference(b)) Turning PSP into a set takes extra time, consumes unnecessary memory, eliminates duplicates (possibly a bad thing), and loses the origin

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Mike Meyer
"fynali" <[EMAIL PROTECTED]> writes: > Hi all, > > I have two files: Others have pointed out the Python solution - use a set instead of a list for membership testing. I want to point out a better Unix solution ('cause I probably wouldn't have written a Python program to do this): > Objective: to

Re: How to remove subset from a file efficiently?

2006-01-12 Thread AJL
On 12 Jan 2006 09:04:21 -0800 "fynali" <[EMAIL PROTECTED]> wrote: > Hi all, > > I have two files: > > - PSP320.dat (quite a large list of mobile numbers), > - CBR319.dat (a subset of the above, a list of barred bumbers) > ... > Objective: to remove the numbers present in barred-lis

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Fredrik Lundh
Steve Holden wrote: > > looks like premature non-optimization to me... > > > It might be quicker to establish a dict whose keys are the barred > numbers and use that, rather than a list, to determine whether the input > numbers should make it through. what do you think > barred = set(ope

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Steve Holden
Fredrik Lundh wrote: > "fynali" wrote: > > >>>Objective: to remove the numbers present in barred-list from the >>>PSPfile. >>> >>>$ ls -lh PSP320.dat CBR319.dat >>>... 56M Dec 28 19:41 PSP320.dat >>>... 8.6M Dec 28 19:40 CBR319.dat >>> >>> $ wc -l PSP320.dat CBR

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Raymond Hettinger
[fynali] > I have two files: > > - PSP320.dat (quite a large list of mobile numbers), > - CBR319.dat (a subset of the above, a list of barred bumbers) # print all non-barred mobile phone numbers barred = set(open('CBR319.dat')) for num in open('PSP320.dat'): if num not in b

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Fredrik Lundh
"fynali" wrote: > > Objective: to remove the numbers present in barred-list from the > > PSPfile. > > > > $ ls -lh PSP320.dat CBR319.dat > > ... 56M Dec 28 19:41 PSP320.dat > > ... 8.6M Dec 28 19:40 CBR319.dat > > > >$ wc -l PSP320.dat CBR319.dat > > 4

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Tim Williams (gmail)
On 12/01/06, Tim Williams (gmail) <[EMAIL PROTECTED]> wrote: On 12 Jan 2006 09:04:21 -0800, fynali < [EMAIL PROTECTED]> wrote: Hi all,I have two files:  - PSP320.dat (quite a large list of mobile numbers),  - CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Tim Williams (gmail)
On 12 Jan 2006 09:04:21 -0800, fynali <[EMAIL PROTECTED]> wrote: Hi all,I have two files:  - PSP320.dat (quite a large list of mobile numbers),  - CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320.dat CBR319.dat ==> PSP320.dat <==96653696338  

How to remove subset from a file efficiently?

2006-01-12 Thread fynali
Hi all, I have two files: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers) # head PSP320.dat CBR319.dat ==> PSP320.dat <== 96653696338 96653766996 96654609431 96654722608 966547