On 13 Jan 2006 23:17:05 -0800, [EMAIL PROTECTED] wrote:
>
>fynali wrote:
>> $ cat cleanup_ray.py
>> #!/usr/bin/python
>> import itertools
>>
>> b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
>>
>> file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,fi
> > b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
> >
> > file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))
> >
> > --
> > $ time ./cleanup_ray.py
> >
> > real0m5.451s
> > user0m4.496
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
#import psyco
#psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number in open('/home/saj
fynali wrote:
> Sorry, pls read that ~15 secs.
That is more or less about it. As set() is faster than dict(), about 2x
on my machine and I assume a portion of your time is in set/dict
creation as it is pretty large data set.
--
http://mail.python.org/mailman/listinfo/python-list
Sorry, pls read that ~15 secs.
--
http://mail.python.org/mailman/listinfo/python-list
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
import psyco
psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number in open('/home/sajid
fynali wrote:
> $ cat cleanup_use_psyco_and_list_compr.py
> #!/usr/bin/python
>
> import psyco
> psyco.full()
>
> postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
> outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
> 'w')
>
> barred = {}
>
>
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
import psyco
psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number in open('/home/sajid
$ cat cleanup.py
#!/usr/bin/python
postpaid_file = open('/home/oracle/stc/test/PSP333')
outfile = open('/home/oracle/stc/test/PSP-CBR.dat', 'w')
barred = {}
for number in open('/home/oracle/stc/test/CBR333'):
barred[number] = None # just add it as a key
fynali wrote:
> [bonono]
> > Have you tried the explicit loop variant with psyco ?
>
> Sure I wouldn't mind trying; can you suggest some code snippets along
> the lines of which I should try...?
>
> [fynali]
> > Needless to say, I'm utterly new to python and my programming
> > skills &
"fynali" wrote:
> Is a rewrite possible of Raymond's or Fredrik's suggestions above which
> will still give me the time saving made?
Python 2.2 don't have a readymade set type (new in 2.3), and it doesn't
support generator expressions (the thing that caused the syntax error).
however, using a di
[bonono]
> Have you tried the explicit loop variant with psyco ?
Sure I wouldn't mind trying; can you suggest some code snippets along
the lines of which I should try...?
[fynali]
> Needless to say, I'm utterly new to python and my programming
> skills & know-how are rudimentary.
(-:
--
$ ./cleanup.py
Traceback (most recent call last):
File "./cleanup.py", line 3, in ?
import itertools
ImportError: No module named itertools
--
$ time ./cleanup.py
File "./cleanup.py", line 8
outfile.writelines(number for number in postpaid_fil
fynali wrote:
> $ cat cleanup_ray.py
> #!/usr/bin/python
> import itertools
>
> b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
>
> file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))
>
> --
> $
$ cat cleanup_ray.py
#!/usr/bin/python
import itertools
b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))
--
$ time ./cleanup_ray.py
$ time fgrep -x -v -f CBR333 PSP333 > PSP-CBR.dat.fgrep
real0m31.551s
user0m16.841s
sys 0m0.912s
--
$ time ./cleanup.py
real0m6.080s
user0m4.836s
sys 0m0.408s
--
$ wc -l PSP-CBR.dat.fgrep PSP-CBR.dat.python
387242
On 12 Jan 2006 22:29:22 -0800
"Raymond Hettinger" <[EMAIL PROTECTED]> wrote:
> AJL wrote:
> > How fast does this run?
> >
> > a = set(file('PSP320.dat'))
> > b = set(file('CBR319.dat'))
> > file('PSP-CBR.dat', 'w').writelines(a.difference(b))
>
> Turning PSP into a set takes extra time, c
Fredrik Lundh wrote:
> Steve Holden wrote:
>
>
>>>looks like premature non-optimization to me...
>>>
>>
>>It might be quicker to establish a dict whose keys are the barred
>>numbers and use that, rather than a list, to determine whether the input
>>numbers should make it through.
>
>
> what do
The code it down to 5 lines!
#!/usr/bin/python
barred = set(open('/home/sajid/python/wip/CBR319.dat'))
postpaid_file = open('/home/sajid/python/wip/PSP320.dat')
outfile = open('/home/sajid/python/wip/PSP-CBR.dat', 'w')
outfile.writelines(number for number in postpaid
On 01/12/2006-09:04AM, fynali wrote:
>
> - PSP320.dat (quite a large list of mobile numbers),
> - CBR319.dat (a subset of the above, a list of barred bumbers)
>
fgrep -x -v -f CBR319.dat PSP320.dat > PSP-CBR.dat
--
http://mail.python.org/mailman/listinfo/python-list
AJL wrote:
> How fast does this run?
>
> a = set(file('PSP320.dat'))
> b = set(file('CBR319.dat'))
> file('PSP-CBR.dat', 'w').writelines(a.difference(b))
Turning PSP into a set takes extra time, consumes unnecessary memory,
eliminates duplicates (possibly a bad thing), and loses the origin
"fynali" <[EMAIL PROTECTED]> writes:
> Hi all,
>
> I have two files:
Others have pointed out the Python solution - use a set instead of a
list for membership testing. I want to point out a better Unix
solution ('cause I probably wouldn't have written a Python program to
do this):
> Objective: to
On 12 Jan 2006 09:04:21 -0800
"fynali" <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have two files:
>
> - PSP320.dat (quite a large list of mobile numbers),
> - CBR319.dat (a subset of the above, a list of barred bumbers)
>
...
> Objective: to remove the numbers present in barred-lis
Steve Holden wrote:
> > looks like premature non-optimization to me...
> >
> It might be quicker to establish a dict whose keys are the barred
> numbers and use that, rather than a list, to determine whether the input
> numbers should make it through.
what do you think
> barred = set(ope
Fredrik Lundh wrote:
> "fynali" wrote:
>
>
>>>Objective: to remove the numbers present in barred-list from the
>>>PSPfile.
>>>
>>>$ ls -lh PSP320.dat CBR319.dat
>>>... 56M Dec 28 19:41 PSP320.dat
>>>... 8.6M Dec 28 19:40 CBR319.dat
>>>
>>> $ wc -l PSP320.dat CBR
[fynali]
> I have two files:
>
> - PSP320.dat (quite a large list of mobile numbers),
> - CBR319.dat (a subset of the above, a list of barred bumbers)
# print all non-barred mobile phone numbers
barred = set(open('CBR319.dat'))
for num in open('PSP320.dat'):
if num not in b
"fynali" wrote:
> > Objective: to remove the numbers present in barred-list from the
> > PSPfile.
> >
> > $ ls -lh PSP320.dat CBR319.dat
> > ... 56M Dec 28 19:41 PSP320.dat
> > ... 8.6M Dec 28 19:40 CBR319.dat
> >
> >$ wc -l PSP320.dat CBR319.dat
> > 4
On 12/01/06, Tim Williams (gmail) <[EMAIL PROTECTED]> wrote:
On 12 Jan 2006 09:04:21 -0800, fynali <
[EMAIL PROTECTED]> wrote:
Hi all,I have two files: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320
On 12 Jan 2006 09:04:21 -0800, fynali <[EMAIL PROTECTED]> wrote:
Hi all,I have two files: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320.dat CBR319.dat
==> PSP320.dat <==96653696338
Hi all,
I have two files:
- PSP320.dat (quite a large list of mobile numbers),
- CBR319.dat (a subset of the above, a list of barred bumbers)
# head PSP320.dat CBR319.dat
==> PSP320.dat <==
96653696338
96653766996
96654609431
96654722608
966547
30 matches
Mail list logo