Okay, I built the array and all seems well. I am filtering the 's
and small words and I have a search action that returns the results
from looking in the keyword tables. I use an if/else to determine if
the value is quoted and go to a = search if it is. It seems to work
well. It returns all results with the appropriate search terms from
the table. Now I am stuck. How in the heck can i weight the returns
and sort them accordingly. I get the regular types of sorting, but was
hoping for the more matches per word or phrase, the higher the
relevancy or the higher the sort in the order.
Has anyone done anything like this? And if so, I would be
interested in how.
Thanks!
Back in the old days of Butler (EveryWare's original main product), one of the main shortcomings of it as a SQL engine was it's inability to do contained searches well. One way of getting around that problem was a procedure that had been written that was similar to what Dave Shelley suggested. It looked for distinct words but instead of creating an array, it turned them into records in a separate table. Kind of like his other suggestion from his CBC work.
The procedure ran on inserts and updates to the main table. That way it would only be performed once per insert and/or update rather than many times each time a select request was made. It was reasonably fast, because it was a 2 column table, the first column was the foreign key back to the main table, the second column was the key word.
I liked Dave's suggestion for tokenizing on the space and punctuation characters and for filtering to eliminate words less than 3 characters in length. That is similar to what the Butler procedure did although it had a list of words to ignore (the, and, a, they, there, etc.) You could do a combination of this. I'd probably make the list of ignored words a table itself so you could add to it as you went along and discovered new words not to include in the unique key words table.
From there, you could add some logic to your search action to look for the SQ and DQ in any user supplied criteria strings and remove them and any characters after them as Dave suggested. So if the user enters <Jim's> you strip out the <'> and the <s> to make the search string <Jim>. I typically do that with results action immediately before the search action where I'll massage the <@ARG> values, put the results into a local/request scope variable, and then use the <@VAR> value in my search action's criteria.
One other suggestion would be to actually write a new record for each search string that a user enters for a test period of time (say a couple of weeks). Then examine those records to get an idea of what your users are searching on and make adjustments on the application to handle anything that you might have missed. One thing with searches is that you typically have no idea what the user is thinking when they are searching your site, it's sometimes nice to capture that information to get an idea of how they are using it.
I do have an old Mac running a copy of Butler but I've gone through the procedures on that machine and the one I remember that handled this isn't there. If I do find it, I'll certainly pass it along.
Hope this helps,
Steve Smith
Oakbridge Information Solutions
Office: (519) 624-4388
GTA: (416) 606-3885
Fax: (519) 624-3353
Cell: (416) 606-3885
Email: [EMAIL PROTECTED]
Web: http://www.oakbridge.ca
On Tuesday, April 27, 2004, at 11:26 AM, Dave Shelley wrote:
I agree with John's assessment that a search engine is the way to go.But if you really want to do it in Witango, here's one possibility:1) tokenize the input string on space, comma, sq, dq, period, and anyother punctuation characters. This gives you a [1,x] array of the words.Transpose it into a [x,1] array2) filter the array to eliminate values < 3 characters long.3) loop through the array to build a sql statement to do your search.Something like:select id, count(id), max(txt)from tablewhere txt like '%Jim%' ortxt like '%Bonnie%' ortxt like '%Resort%'group by idorder by count(id) descthis will give you an array of the rows that contain the any of thewords, sorted by the number of times the words appearTo give users the ability to quote strings and get an exact match, checkto see if the first and last characters are sq or dq. If so, strip themoff and skip the tokenize step.I haven't actually tried this methodology, but it should work. It may beslow though depending on your database server, amount of data andindexing.Another methodology I've worked with many years ago at CBC news was totake all the distinct words from every article and insert them into atable. Then build a many<->many relationship with the articles to showwhich words appeared in what article. That made for a couple of hugetables, but it was well indexed and running on a mainframe so itresulted in some fast searches.Dave-----Original Message-----From: John McGowan [mailto:[EMAIL PROTECTED]Sent: April 27, 2004 9:53 AMTo: [EMAIL PROTECTED]Subject: Re: Witango-Talk: Search Engine Format Typeif the content you want to search can be "spidered" just install asearch engine. I and others on the list have had great successintegrating the swish-e search engine into our Witango apps./John[EMAIL PROTECTED] wrote:
I have a hobby site that I work on in my spare time. It has a forum,chat room, ya da ya da ya da. In it I have created 3 different searchlike engines that list resorts, fishing guides and bait and tackleshops. (you can check it out if you like, just a few months old, butgrowing http://MyFishingPals.com )Anyway, I was wondering if anyone has used Witango to create a searchengine somewhat like google. I have been trying to figure a way to dothis. The big thing is how to search for certain terms. In otherwords, a "contains" search does exactly as expected, but how does onedo partial phrase searches and the like. I am not worried aboutstemming (related keywords), just trying to figure out how to do this.example if searching for "jim" returns "Jim and Bonny's Resort" goodso far... Butsearching for jim's returns nothing.And is there a way to use quotes for exact phrase? More like a searchengine?Anyway, just wondering if anyone has done this or pondered this atall. Any suggestions would be great.Thanks________________________________________________________________________TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf________________________________________________________________________TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
--
________________________________________________________________________ TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf