Re: Efficient grep using Python?

2004-12-17 Thread TZOTZIOY
On Fri, 17 Dec 2004 14:22:34 +, rumours say that [EMAIL PROTECTED] might have written: sf: >sf wrote: >> The point is that when you have 100,000s of records, this grep becomes >> really slow? > >There are performance bugs with current versions of grep >and multibyte characters that are only g

Re: Efficient grep using Python?

2004-12-17 Thread P
sf wrote: The point is that when you have 100,000s of records, this grep becomes really slow? There are performance bugs with current versions of grep and multibyte characters that are only getting addressed now. To work around these do `export LANG=C` first. In my experience grep is not scalable s

Re: Efficient grep using Python? [OT]

2004-12-17 Thread TZOTZIOY
On Fri, 17 Dec 2004 12:21:08 +, rumours say that [EMAIL PROTECTED] might have written: [snip some damn lie aka "benchmark"] [me] >> (Yes, I cheated by adding the F (for no regular expressions) flag :) > >Also you only have 1000 entries in B! >Try it again with all entries in B also ;-) >Remem

Re: Efficient grep using Python?

2004-12-17 Thread sf
The point is that when you have 100,000s of records, this grep becomes really slow? Any comments? Thats why I looked for python :) > that would be > > grep -vf B A > > and it is a rare use of grep, indeed. > -- > TZOTZIOY, I speak England very best. > "Be strict when sending and tolerant when

Re: Efficient grep using Python?

2004-12-17 Thread P
Christos TZOTZIOY Georgiou wrote: On Thu, 16 Dec 2004 14:28:21 +, rumours say that [EMAIL PROTECTED] I challenge you to a benchmark :-) Well, the numbers I provided above are almost meaningless with such a small set (and they easily could be reverse, I just kept the convenient-to-me first run

Re: Efficient grep using Python?

2004-12-16 Thread TZOTZIOY
On Thu, 16 Dec 2004 14:28:21 +, rumours say that [EMAIL PROTECTED] might have written: [sf] Essentially, want to do efficient grep, i..e from A remove those lines which are also present in file B. [EMAIL PROTECTED] >>>You could implement elegantly using the new sets feature >>>For ref

Re: Efficient grep using Python?

2004-12-16 Thread P
Christos TZOTZIOY Georgiou wrote: On Wed, 15 Dec 2004 16:10:08 +, rumours say that [EMAIL PROTECTED] might have written: Essentially, want to do efficient grep, i..e from A remove those lines which are also present in file B. You could implement elegantly using the new sets feature For referen

Re: Efficient grep using Python?

2004-12-16 Thread TZOTZIOY
On Wed, 15 Dec 2004 16:10:08 +, rumours say that [EMAIL PROTECTED] might have written: >> Essentially, want to do efficient grep, i..e from A remove those lines which >> are also present in file B. > >You could implement elegantly using the new sets feature >For reference here is the unix way

Re: Efficient grep using Python?

2004-12-16 Thread TZOTZIOY
On Wed, 15 Dec 2004 17:07:37 +0100, rumours say that "Fredrik Lundh" <[EMAIL PROTECTED]> might have written: >> Essentially, want to do efficient grep, i..e from A remove those lines which >> are also present in file B. > >that's an unusual definition of "grep" that would be grep -vf B A and it

Re: Efficient grep using Python?

2004-12-15 Thread Tim Peters
[Jane Austine] > fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be > equivalent. Semantically, yes; pragmatically, no, in the way explained before. > When I pass an iterator instance(or a generator iterator) to the > dict.fromkeys, it is expanded at that moment, I don't know what "e

Efficient grep using Python?

2004-12-15 Thread Jane Austine
[Fredrik Lundh] >>> bdict = dict.fromkeys(open(bfile).readlines()) >>> >>> for line in open(afile): >>>if line not in bdict: >>>print line, >>> >>> [Tim Peters] >> Note that an open file is an iterable object, yielding the lines in >> the file. The "for" loop exploited that above, bu

Re: Efficient grep using Python?

2004-12-15 Thread Tim Peters
[Fredrik Lundh] >>> bdict = dict.fromkeys(open(bfile).readlines()) >>> >>> for line in open(afile): >>>if line not in bdict: >>>print line, >>> >>> [Tim Peters] >> Note that an open file is an iterable object, yielding the lines in >> the file. The "for" loop exploited that above, bu

Re: Efficient grep using Python?

2004-12-15 Thread Fredrik Lundh
Tim Peters wrote: >> bdict = dict.fromkeys(open(bfile).readlines()) >> >> for line in open(afile): >>if line not in bdict: >>print line, >> >> > > Note that an open file is an iterable object, yielding the lines in > the file. The "for" loop exploited that above, but fromkeys() can >

Re: Efficient grep using Python?

2004-12-15 Thread John Hunter
> "sf" == sf <[EMAIL PROTECTED]> writes: sf> Just started thinking about learning python. Is there any sf> place where I can get some free examples, especially for sf> following kind of problem ( it must be trivial for those using sf> python) sf> I have files A, and B ea

Re: Efficient grep using Python?

2004-12-15 Thread Tim Peters
["sf" <[EMAIL PROTECTED]>] >> I have files A, and B each containing say 100,000 lines (each >> line=one string without any space) >> >> I want to do >> >> " A - (A intersection B) " >> >> Essentially, want to do efficient grep, i..e from A remove those >> lines which are also present in file B.

Re: Efficient grep using Python?

2004-12-15 Thread P
sf wrote: Just started thinking about learning python. Is there any place where I can get some free examples, especially for following kind of problem ( it must be trivial for those using python) I have files A, and B each containing say 100,000 lines (each line=one string without any space) I want

Re: Efficient grep using Python?

2004-12-15 Thread Fredrik Lundh
"sf" <[EMAIL PROTECTED]> wrote: > I have files A, and B each containing say 100,000 lines (each line=one > string without any space) > > I want to do > > " A - (A intersection B) " > > Essentially, want to do efficient grep, i..e from A remove those lines which > are also present in file B. th

Efficient grep using Python?

2004-12-15 Thread sf
Just started thinking about learning python. Is there any place where I can get some free examples, especially for following kind of problem ( it must be trivial for those using python) I have files A, and B each containing say 100,000 lines (each line=one string without any space) I want to do