Been there, done that.  Here are a few potential suggestions:

1. Keep a library of throwaway words, and omit them from any cross reference. Perhaps Industrial should not be one of them.
2. Use soundex on each non-throwaway word, and index them.
3. Index each non-throwaway word, as entered.
4. Index (using b-tree) each name, as typed.
5. Build a "finder" screen that allows the user to search using any of the above methods. This way, the user can typically use the b-tree which is the fastest, and then alternately use any of the other methods if the particular entity is not found.
6. Allow "and" and "or" functions with the non-b-tree search methodologies, and perhaps, other fields, such as area code, state abbreviation, etc.


7. Down the road, at data entry time, attempt to normalize and verify fields such as area code with zip code and state.

IHIH.

Matthew H. Stern, CCP/CDP, [EMAIL PROTECTED]
Serving the IT industry since 1976
Comprehensive Computer Services Inc.
www.comprehensive.com
Phone: 631 755-2250, Fax 755-2254
560 Broad Hollow Road, Melville NY 11747



Mark Johnson wrote:

One of my clients has roughly 75,000 records in their customer database. Oddly
enough, until now there has not been that great of a need for a lookup
function. Now they would like a lookup function that may go beyond the usual
elements of regular lookups.

First, there is a lot of mis-spelling or alternate spelling of similar names.
For example, K-mart is spelled K-Mart, K Mart and KMART. This is not a
situation of simple word lookups. There needs to be some intuition as well.

Sometimes there's THE INDUSTRIAL DISTRIBUTION COMPANY (example) whereby the
cust name is made up completely of the throwaway words. Other times there's
Children's Wear, Childrens Wear and Childrens Ware. Don't get me started on
the other mis-spellings of Accessories, Accesories and, you bet, Accessorys.

Soundex is a nice second plan but even that is dependent upon consistent
spelling of the words. TH versus HT on Clothing yields different Soundex
values.

So my question is if anyone has any insight on where to learn more about
solving this problem of such variable words. I'm open to any ideas.

Thanks in advance
Mark Johnson
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to