Title: Re: Witango-Talk: Search Engine Format Type
Okay, I built the array and all seems well. I am filtering the 's and small words and I have a search action that returns the results from looking in the keyword tables. I use an if/else to determine if the value is quoted and go to a = search if it is. It seems to work well. It returns all results with the appropriate search terms from the table. Now I am stuck. How in the heck can i weight the returns and sort them accordingly. I get the regular types of sorting, but was hoping for the more matches per word or phrase, the higher the relevancy or the higher the sort in the order.

Has anyone done anything like this? And if so, I would be interested in how.

Thanks!

Back in the old days of Butler (EveryWare's original main product), one of the main shortcomings of it as a SQL engine was it's inability to do contained searches well. One way of getting around that problem was a procedure that had been written that was similar to what Dave Shelley suggested. It looked for distinct words but instead of creating an array, it turned them into records in a separate table. Kind of like his other suggestion from his CBC work.

The procedure ran on inserts and updates to the main table. That way it would only be performed once per insert and/or update rather than many times each time a select request was made. It was reasonably fast, because it was a 2 column table, the first column was the foreign key back to the main table, the second  column was the key word.

I liked Dave's suggestion for tokenizing on the space and punctuation characters and for filtering to eliminate words less than 3 characters in length. That is similar to what the Butler procedure did although it had a list of words to ignore (the, and, a, they, there, etc.) You could do a combination of this. I'd probably make the list of ignored words a table itself so you could add to it as you went along and discovered new words not to include in the unique key words table.

From there, you could add some logic to your search action to look for the SQ and DQ in any user supplied criteria strings and remove them and any characters after them as Dave suggested. So if the user enters <Jim's> you strip out the <'> and the <s> to make the search string <Jim>. I typically do that with results action immediately before the search action where I'll massage the <@ARG> values, put the results into a local/request scope variable, and then use the <@VAR> value in my search action's criteria.

One other suggestion would be to actually write a new record for each search string that a user enters for a test period of time (say a couple of weeks). Then examine those records to get an idea of what your users are searching on and make adjustments on the application to handle anything that you might have missed. One thing with searches is that you typically have no idea what the user is thinking when they are searching your site, it's sometimes nice to capture that information to get an idea of how they are using it.

I do have an old Mac running a copy of Butler but I've gone through the procedures on that machine and the one I remember that handled this isn't there. If I do find it, I'll certainly pass it along.

Hope this helps,

Steve Smith

Oakbridge Information Solutions
Office: (519) 624-4388
GTA:    (416) 606-3885
Fax:    (519) 624-3353
Cell:   (416) 606-3885
Email:  [EMAIL PROTECTED]
Web:    http://www.oakbridge.ca

On Tuesday, April 27, 2004, at 11:26 AM, Dave Shelley wrote:

I agree with John's assessment that a search engine is the way to go.

But if you really want to do it in Witango, here's one possibility:

1) tokenize the input string on space, comma, sq, dq, period, and any
other punctuation characters. This gives you a [1,x] array of the words.
Transpose it into a [x,1] array

2) filter the array to eliminate values < 3 characters long.

3) loop through the array to build a sql statement to do your search.

Something like:
select id, count(id), max(txt)
from table
where txt like '%Jim%' or
txt like '%Bonnie%' or
txt like '%Resort%'
group by id
order by count(id) desc

this will give you an array of the rows that contain the any of the
words, sorted by the number of times the words appear


To give users the ability to quote strings and get an exact match, check
to see if the first and last characters are sq or dq. If so, strip them
off and skip the tokenize step.

I haven't actually tried this methodology, but it should work. It may be
slow though depending on your database server, amount of data and
indexing.

Another methodology I've worked with many years ago at CBC news was to
take all the distinct words from every article and insert them into a
table. Then build a many<->many relationship with the articles to show
which words appeared in what article. That made for a couple of huge
tables, but it was well indexed and running on a mainframe so it
resulted in some fast searches.

Dave

-----Original Message-----
From: John McGowan [mailto:[EMAIL PROTECTED]
Sent: April 27, 2004 9:53 AM
To: [EMAIL PROTECTED]
Subject: Re: Witango-Talk: Search Engine Format Type

if the content you want to search can be "spidered" just install a
search engine.  I and others on the list have had great success
integrating the swish-e search engine into our Witango apps.

/John

[EMAIL PROTECTED] wrote:

I have a hobby site that I work on in my spare time. It has a forum,
chat room, ya da ya da ya da. In it I have created 3 different search
like engines that list resorts, fishing guides and bait and tackle
shops. (you can check it out if you like, just a few months old, but
growing  http://MyFishingPals.com   )

Anyway, I was wondering if anyone has used Witango to create a search
engine somewhat like google. I have been trying to figure a way to do
this. The big thing is how to search for certain terms. In other
words, a "contains" search does exactly as expected, but how does one
do partial phrase searches and the like. I am not worried about
stemming (related keywords), just trying to figure out how to do this.

example if searching for "jim" returns "Jim and Bonny's Resort" good
so far... But
searching for jim's returns nothing.

And is there a way to use quotes for exact phrase? More like a search
engine?

Anyway, just wondering if anyone has done this or pondered this at
all. Any suggestions would be great.

Thanks


________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf

________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


-- 

________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf

Reply via email to