RE: looking for faster Ideas...

George Gallen Tue, 27 Jan 2004 12:10:20 -0800

Title: RE: looking for faster Ideas...

thats why the zip code and sometimes the part of the address
is used also, the chances of the matching part of the name
the zip code, and part of the address and NOT being unique
is extremely low.

Which is also what complicates this.

George

>-----Original Message-----
>From: Ian McGowan [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, January 27, 2004 1:56 PM
>To: U2 Users Discussion List
>Subject: Re: looking for faster Ideas...
>
>
>do it outside basic using
>
>$grep -F -f pattern-file csv-file > remove-file
>
>the pattern file would have the pieces in there. what if you're
>excluding something that's not unique? "smith" would exclude
>"smithers", "smithy". "psmith (one for the wodehouse fans :-)" etc.
>
>i do this with some huge syslog files, and fairly big pattern files and
>it's pretty darn quick.
>
>ian
>
>On Tue, 2004-01-27 at 10:33, George Gallen wrote:
>> I can't setup any indexs to speed this up. Basically I'm
>scanning a CSV
>> file
>> for names to remove
>>    and set the flag of KICK=1 to remove it (creating a new
>CSV file at
>> the
>> same time).
>>
>> Keep in mind the ".." are people's last names, or zip codes,
>or part of
>> their address, changed
>> them to ".." to protect the unwanting...
>>
>> Right now, I do a series of CASE's ...
>> Now, it's not a major problem as I'm only checking for 20 or
>so names,
>> but
>> as more and more people
>>   request to be removed (and we don't have access to the
>creation of the
>> list). this could get quite
>>   slow over 50 or 60 thousand lines of checking.
>>
>> LIN is one line of the CSV file, the INDEX is checking for a
>last name &
>> a
>> zip code and sometimes
>>    part of the address line.
>>
>> Any Ideas?
>>
>> Remember, we can't change the source of the file, it will always be a
>> CSV,
>> being read line by line
>>
>>    KICK=0
>>    BEGIN CASE
>>       CASE -1
>>          KICK=1
>>       BEGIN CASE
>>             CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>> INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0 AND
>> INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE INDEX(LIN,"..",1)#0 AND INDEX(LIN,"..",1)#0
>>          CASE -1
>>             KICK=0
>>       END CASE
>>    END CASE
>>
>> George Gallen
>> Senior Programmer/Analyst
>> Accounting/Data Division
>> [EMAIL PROTECTED]
>> ph:856.848.1000 Ext 220
>>
>> SLACK Incorporated - An innovative information, education
>and management
>> company
>> http://www.slackinc.com
>>
>> _______________________________________________
>> u2-users mailing list
>> [EMAIL PROTECTED]
>> http://www.oliver.com/mailman/listinfo/u2-users
>--
>Ian McGowan <[EMAIL PROTECTED]>
>
>_______________________________________________
>u2-users mailing list
>[EMAIL PROTECTED]
>http://www.oliver.com/mailman/listinfo/u2-users
>

_______________________________________________
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

RE: looking for faster Ideas...

Reply via email to