Been there, done that. Here are a few potential suggestions:
1. Keep a library of throwaway words, and omit them from any cross reference. Perhaps Industrial should not be one of them.
2. Use soundex on each non-throwaway word, and index them.
3. Index each non-throwaway word, as entered.
4. Index (using b-tree) each name, as typed.
5. Build a "finder" screen that allows the user to search using any of the above methods. This way, the user can typically use the b-tree which is the fastest, and then alternately use any of the other methods if the particular entity is not found.
6. Allow "and" and "or" functions with the non-b-tree search methodologies, and perhaps, other fields, such as area code, state abbreviation, etc.
7. Down the road, at data entry time, attempt to normalize and verify fields such as area code with zip code and state.
IHIH.
Matthew H. Stern, CCP/CDP, [EMAIL PROTECTED] Serving the IT industry since 1976 Comprehensive Computer Services Inc. www.comprehensive.com Phone: 631 755-2250, Fax 755-2254 560 Broad Hollow Road, Melville NY 11747
Mark Johnson wrote:
One of my clients has roughly 75,000 records in their customer database. Oddly enough, until now there has not been that great of a need for a lookup function. Now they would like a lookup function that may go beyond the usual elements of regular lookups.
First, there is a lot of mis-spelling or alternate spelling of similar names. For example, K-mart is spelled K-Mart, K Mart and KMART. This is not a situation of simple word lookups. There needs to be some intuition as well.
Sometimes there's THE INDUSTRIAL DISTRIBUTION COMPANY (example) whereby the cust name is made up completely of the throwaway words. Other times there's Children's Wear, Childrens Wear and Childrens Ware. Don't get me started on the other mis-spellings of Accessories, Accesories and, you bet, Accessorys.
Soundex is a nice second plan but even that is dependent upon consistent spelling of the words. TH versus HT on Clothing yields different Soundex values.
So my question is if anyone has any insight on where to learn more about solving this problem of such variable words. I'm open to any ideas.
Thanks in advance Mark Johnson ------- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
------- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/