Re: Witango-Talk: Search Engine Format Type

2004-05-04 Thread Bill Conlon
ng like:
>>>select id, count(id), max(txt)
>>>from table
>>>where txt like '%Jim%' or
>>>txt like '%Bonnie%' or
>>>txt like '%Resort%'
>>>group by id
>>>order by count(id) desc
>>>
>>>this will give you an array of the rows that contain the any of the
>>>words, sorted by the number of times the words appear
>>>
>>>
>>>To give users the ability to quote strings and get an exact match, check
>>>to see if the first and last characters are sq or dq. If so, strip them
>>>off and skip the tokenize step.
>>>
>>>I haven't actually tried this methodology, but it should work. It may be
>>>slow though depending on your database server, amount of data and
>>>indexing.
>>>
>>>Another methodology I've worked with many years ago at CBC news was to
>>>take all the distinct words from every article and insert them into a
>>>table. Then build a many<->many relationship with the articles to show
>>>which words appeared in what article. That made for a couple of huge
>>>tables, but it was well indexed and running on a mainframe so it
>>>resulted in some fast searches.
>>>
>>>Dave
>>>
>>>-Original Message-
>>>From: John McGowan [mailto:[EMAIL PROTECTED]
>>>Sent: April 27, 2004 9:53 AM
>>>To: [EMAIL PROTECTED]
>>>Subject: Re: Witango-Talk: Search Engine Format Type
>>>
>>>if the content you want to search can be "spidered" just install a
>>>search engine.  I and others on the list have had great success
>>>integrating the swish-e search engine into our Witango apps.
>>>
>>>/John
>>>
>>>[EMAIL PROTECTED] wrote:
>>>
>>>>I have a hobby site that I work on in my spare time. It has a forum,
>>>>chat room, ya da ya da ya da. In it I have created 3 different search
>>>>like engines that list resorts, fishing guides and bait and tackle
>>>>shops. (you can check it out if you like, just a few months old, but
>>>>growing  http://MyFishingPals.com   )
>>>>
>>>>Anyway, I was wondering if anyone has used Witango to create a search
>>>>engine somewhat like google. I have been trying to figure a way to do
>>>>this. The big thing is how to search for certain terms. In other
>>>>words, a "contains" search does exactly as expected, but how does one
>>>>do partial phrase searches and the like. I am not worried about
>>>>stemming (related keywords), just trying to figure out how to do this.
>>>>
>>>>example if searching for "jim" returns "Jim and Bonny's Resort" good
>>>>so far... But
>>>>searching for jim's returns nothing.
>>>>
>>>>And is there a way to use quotes for exact phrase? More like a search
>>>>engine?
>>>>
>>>>Anyway, just wondering if anyone has done this or pondered this at
>>>>all. Any suggestions would be great.
>>>>
>>>>Thanks
>>>>
>>>
>>>
>>>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
>>>
>>>
>>>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
>
>
>-- 
>
>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


Bill Conlon

To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306

office: 650.327.2175
fax:650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web:http://www.tothept.com



TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


Re: Witango-Talk: Search Engine Format Type

2004-05-04 Thread webdude
Title: Re: Witango-Talk: Search Engine Format
Type


Okay, I built the array and all seems well. I am filtering the 's
and small words and I have a search action that returns the results
from looking in the keyword tables. I use an if/else to determine if
the value is quoted and go to a = search if it is. It seems to work
well. It returns all results with the appropriate search terms from
the table. Now I am stuck. How in the heck can i weight the returns
and sort them accordingly. I get the regular types of sorting, but was
hoping for the more matches per word or phrase, the higher the
relevancy or the higher the sort in the order.

Has anyone done anything like this? And if so, I would be
interested in how.

Thanks!

Back in the old days of Butler
(EveryWare's original main product), one of the main shortcomings of
it as a SQL engine was it's inability to do contained searches well.
One way of getting around that problem was a procedure that had been
written that was similar to what Dave Shelley suggested. It looked for
distinct words but instead of creating an array, it turned them into
records in a separate table. Kind of like his other suggestion from
his CBC work.

The procedure ran on inserts and updates
to the main table. That way it would only be performed once per insert
and/or update rather than many times each time a select request was
made. It was reasonably fast, because it was a 2 column table, the
first column was the foreign key back to the main table, the second 
column was the key word.

I liked Dave's suggestion for tokenizing
on the space and punctuation characters and for filtering to eliminate
words less than 3 characters in length. That is similar to what the
Butler procedure did although it had a list of words to ignore (the,
and, a, they, there, etc.) You could do a combination of this. I'd
probably make the list of ignored words a table itself so you could
add to it as you went along and discovered new words not to include in
the unique key words table.

From there, you could add some logic to
your search action to look for the SQ and DQ in any user supplied
criteria strings and remove them and any characters after them as Dave
suggested. So if the user enters  you strip out the
<'> and the  to make the search string . I
typically do that with results action immediately before the search
action where I'll massage the <@ARG> values, put the results
into a local/request scope variable, and then use the <@VAR>
value in my search action's criteria.

One other suggestion would be to actually
write a new record for each search string that a user enters for a
test period of time (say a couple of weeks). Then examine those
records to get an idea of what your users are searching on and make
adjustments on the application to handle anything that you might have
missed. One thing with searches is that you typically have no idea
what the user is thinking when they are searching your site, it's
sometimes nice to capture that information to get an idea of how they
are using it.

I do have an old Mac running a copy of
Butler but I've gone through the procedures on that machine and the
one I remember that handled this isn't there. If I do find it, I'll
certainly pass it along.

Hope this helps,

Steve Smith

Oakbridge Information
Solutions
Office: (519) 624-4388
GTA:    (416)
606-3885
Fax:    (519)
624-3353
Cell:   (416)
606-3885
Email: 
[EMAIL PROTECTED]
Web:    http://www.oakbridge.ca

On Tuesday, April 27, 2004, at 11:26 AM,
Dave Shelley wrote:

I agree with John's assessment that a
search engine is the way to go.

But if you really want to do it in
Witango, here's one possibility:

1) tokenize the input string on space,
comma, sq, dq, period, and any
other punctuation characters. This gives
you a [1,x] array of the words.
Transpose it into a [x,1]
array

2) filter the array to eliminate values <
3 characters long.

3) loop through the array to build a sql
statement to do your search.

Something like:
select id, count(id),
max(txt)
from table
where txt like '%Jim%' or
txt like '%Bonnie%' or
txt like '%Resort%'
group by id
order by count(id) desc

this will give you an array of the rows
that contain the any of the
words, sorted by the number of times the
words appear


To give users the ability to quote
strings and get an exact match, check
to see if the first and last characters
are sq or dq. If so, strip them
off and skip the tokenize
step.

I haven't actually tried this
methodology, but it should work. It may be
slow though depending on your database
server, amount of data and
indexing.

Another methodology I've worked with many
years ago at CBC news was to
take all the distinct words from every
article and insert them into a
table. Then build a many<->many
relationship with the articles to show
which words appeared in what article.
That made for a cou

Re: Witango-Talk: Search Engine Format Type

2004-04-28 Thread webdude
Title: Re: Witango-Talk: Search Engine Format
Type


That would be great Steve.

Back in the old days of Butler
(EveryWare's original main product), one of the main shortcomings of
it as a SQL engine was it's inability to do contained searches well.
One way of getting around that problem was a procedure that had been
written that was similar to what Dave Shelley suggested. It looked for
distinct words but instead of creating an array, it turned them into
records in a separate table. Kind of like his other suggestion from
his CBC work.

The procedure ran on inserts and updates
to the main table. That way it would only be performed once per insert
and/or update rather than many times each time a select request was
made. It was reasonably fast, because it was a 2 column table, the
first column was the foreign key back to the main table, the second 
column was the key word.

I liked Dave's suggestion for tokenizing
on the space and punctuation characters and for filtering to eliminate
words less than 3 characters in length. That is similar to what the
Butler procedure did although it had a list of words to ignore (the,
and, a, they, there, etc.) You could do a combination of this. I'd
probably make the list of ignored words a table itself so you could
add to it as you went along and discovered new words not to include in
the unique key words table.

From there, you could add some logic to
your search action to look for the SQ and DQ in any user supplied
criteria strings and remove them and any characters after them as Dave
suggested. So if the user enters  you strip out the
<'> and the  to make the search string . I
typically do that with results action immediately before the search
action where I'll massage the <@ARG> values, put the results
into a local/request scope variable, and then use the <@VAR>
value in my search action's criteria.

One other suggestion would be to actually
write a new record for each search string that a user enters for a
test period of time (say a couple of weeks). Then examine those
records to get an idea of what your users are searching on and make
adjustments on the application to handle anything that you might have
missed. One thing with searches is that you typically have no idea
what the user is thinking when they are searching your site, it's
sometimes nice to capture that information to get an idea of how they
are using it.

I do have an old Mac running a copy of
Butler but I've gone through the procedures on that machine and the
one I remember that handled this isn't there. If I do find it, I'll
certainly pass it along.

Hope this helps,

Steve Smith

Oakbridge Information
Solutions
Office: (519) 624-4388
GTA:    (416)
606-3885
Fax:    (519)
624-3353
Cell:   (416)
606-3885
Email: 
[EMAIL PROTECTED]
Web:    http://www.oakbridge.ca

On Tuesday, April 27, 2004, at 11:26 AM,
Dave Shelley wrote:

I agree with John's assessment that a
search engine is the way to go.

But if you really want to do it in
Witango, here's one possibility:

1) tokenize the input string on space,
comma, sq, dq, period, and any
other punctuation characters. This gives
you a [1,x] array of the words.
Transpose it into a [x,1]
array

2) filter the array to eliminate values <
3 characters long.

3) loop through the array to build a sql
statement to do your search.

Something like:
select id, count(id),
max(txt)
from table
where txt like '%Jim%' or
txt like '%Bonnie%' or
txt like '%Resort%'
group by id
order by count(id) desc

this will give you an array of the rows
that contain the any of the
words, sorted by the number of times the
words appear


To give users the ability to quote
strings and get an exact match, check
to see if the first and last characters
are sq or dq. If so, strip them
off and skip the tokenize
step.

I haven't actually tried this
methodology, but it should work. It may be
slow though depending on your database
server, amount of data and
indexing.

Another methodology I've worked with many
years ago at CBC news was to
take all the distinct words from every
article and insert them into a
table. Then build a many<->many
relationship with the articles to show
which words appeared in what article.
That made for a couple of huge
tables, but it was well indexed and
running on a mainframe so it
resulted in some fast
searches.

Dave

-Original Message-----
From: John McGowan
[mailto:[EMAIL PROTECTED]
Sent: April 27, 2004 9:53 AM
To: [EMAIL PROTECTED]
Subject: Re: Witango-Talk: Search Engine
Format Type

if the content you want to search can be
"spidered" just install a
search engine.  I and others on the
list have had great success
integrating the swish-e search engine
into our Witango apps.

/John

[EMAIL PROTECTED]
wrote:

I have a hobby site that I work on in my
spare time. It has a forum,
chat room, ya da ya da ya da. In it I
have created 3 diffe

Re: Witango-Talk: Search Engine Format Type

2004-04-28 Thread Steve Smith
Back in the old days of Butler (EveryWare's original main product), one of the main shortcomings of it as a SQL engine was it's inability to do contained searches well. One way of getting around that problem was a procedure that had been written that was similar to what Dave Shelley suggested. It looked for distinct words but instead of creating an array, it turned them into records in a separate table. Kind of like his other suggestion from his CBC work.

The procedure ran on inserts and updates to the main table. That way it would only be performed once per insert and/or update rather than many times each time a select request was made. It was reasonably fast, because it was a 2 column table, the first column was the foreign key back to the main table, the second  column was the key word.

I liked Dave's suggestion for tokenizing on the space and punctuation characters and for filtering to eliminate words less than 3 characters in length. That is similar to what the Butler procedure did although it had a list of words to ignore (the, and, a, they, there, etc.) You could do a combination of this. I'd probably make the list of ignored words a table itself so you could add to it as you went along and discovered new words not to include in the unique key words table.

>From there, you could add some logic to your search action to look for the SQ and DQ in any user supplied criteria strings and remove them and any characters after them as Dave suggested. So if the user enters  you strip out the <'> and the  to make the search string . I typically do that with results action immediately before the search action where I'll massage the <@ARG> values, put the results into a local/request scope variable, and then use the <@VAR> value in my search action's criteria.

One other suggestion would be to actually write a new record for each search string that a user enters for a test period of time (say a couple of weeks). Then examine those records to get an idea of what your users are searching on and make adjustments on the application to handle anything that you might have missed. One thing with searches is that you typically have no idea what the user is thinking when they are searching your site, it's sometimes nice to capture that information to get an idea of how they are using it.

I do have an old Mac running a copy of Butler but I've gone through the procedures on that machine and the one I remember that handled this isn't there. If I do find it, I'll certainly pass it along.

Hope this helps,

Steve Smith

Oakbridge Information Solutions
Office: (519) 624-4388
GTA:(416) 606-3885
Fax:(519) 624-3353
Cell:   (416) 606-3885
Email:  [EMAIL PROTECTED]
Web:http://www.oakbridge.ca

On Tuesday, April 27, 2004, at 11:26 AM, Dave Shelley wrote:

I agree with John's assessment that a search engine is the way to go. 

But if you really want to do it in Witango, here's one possibility:

1) tokenize the input string on space, comma, sq, dq, period, and any
other punctuation characters. This gives you a [1,x] array of the words.
Transpose it into a [x,1] array

2) filter the array to eliminate values < 3 characters long.

3) loop through the array to build a sql statement to do your search. 

Something like:
select id, count(id), max(txt) 
from table
where txt like '%Jim%' or 
txt like '%Bonnie%' or
txt like '%Resort%'
group by id
order by count(id) desc

this will give you an array of the rows that contain the any of the
words, sorted by the number of times the words appear


To give users the ability to quote strings and get an exact match, check
to see if the first and last characters are sq or dq. If so, strip them
off and skip the tokenize step.

I haven't actually tried this methodology, but it should work. It may be
slow though depending on your database server, amount of data and
indexing.

Another methodology I've worked with many years ago at CBC news was to
take all the distinct words from every article and insert them into a
table. Then build a many<->many relationship with the articles to show
which words appeared in what article. That made for a couple of huge
tables, but it was well indexed and running on a mainframe so it
resulted in some fast searches.

Dave

-----Original Message-----
From: John McGowan [mailto:[EMAIL PROTECTED] 
Sent: April 27, 2004 9:53 AM
To: [EMAIL PROTECTED]
Subject: Re: Witango-Talk: Search Engine Format Type

if the content you want to search can be "spidered" just install a 
search engine.  I and others on the list have had great success 
integrating the swish-e search engine into our Witango apps.

/John

[EMAIL PROTECTED] wrote:

I have a hobby site that I work on in my spare time. It has a forum, 
chat room, ya da ya da ya da. In it I have created 3 different search 
like engines that list resorts, fishing guides and bait

Re: Witango-Talk: Search Engine Format Type

2004-04-27 Thread Bill Conlon
I would use swish-e using the -S prog method to spider all the external 
sites.  

Execute a shell script like the following when you want the index rebuilt:

#!/bin/sh

/usr/local/bin/swish-e -S prog -c /path to configuration/spider.config -f 
/path to configuration/index.swish-e

index.swish-e gets the index.

spider.config looks something like:

IndexDir spider.pl
SwishProgParameters /path to configuration/swish.conf
UndefinedMetaTags auto
StoreDescription TXT* 1
StoreDescription HTML*  1

This tells swish-e to use spider.pl,  the perl spider provided with 
swish-e.  Any meta tags it finds will be stored in the index.  It also 
stores 1 bytes of data from text and html documents, ignoring other 
types.  (Read the docs if you want to index pdf, word, excel, etc.)

swish.conf will have a a hash (essentially a perl array) called @servers 
that tells the spider which sites to index, and what pages to ignore.  
One element of the hash might look like:

@servers = (
{
base_url=> 'http://www.website.com',
email   => '[EMAIL PROTECTED]',
use_cookies => 0,
debug => DEBUG_URL | DEBUG_SKIPPED | DEBUG_FAILED | DEBUG_HEADERS,
delay_sec => 1,
test_response => sub
 {
my $content_type = $_[2]->content_type;
return $content_type =~ m!text/html!;
 },
},
);

base_url is the site to be indexed.  We don't use cookies, we want debug 
info written, we wait one second between requests, and the test_response 
function is used to only send text and html documents to the indexing 
spider.

You could extract all the url's from the db and build a text file that 
gets written to /path to configuration/swish.conf.  Note that there are 
many other parameters, you can set up, and each site to be spidered can 
have different set.  

One issue to consider is that swish-e does not support incremental 
indexing, so rebuilding the index can be costly.  You could also maintain 
separate indices for each site, but I wouldn't recommend it with so many 
sites, because matching search results will be costly.  

Once the index is built, I use a taf file to query the index and 
highlight the search terms in the results array.  But, swish-e will do 
that for you also if you use its search.cgi perl script.
>The problem I am having looking at canned spidering programs is that 
>they spider everything on your site. I need something that will 
>search a db (MSSQL) that will spider all the links that are contained 
>in the db.
>
>In other words, I have a db with about 500 links to other sites. In 
>the db table are things like name, city, lake, description, url. It 
>seems these canned programs will spider my site okay, but I don't 
>want to search my site, I want to search the db for the links to 
>other sites.
>
>>if the content you want to search can be "spidered" just install a 
>>search engine.  I and others on the list have had great success 
>>integrating the swish-e search engine into our Witango apps.
>>
>>/John
>>
>>[EMAIL PROTECTED] wrote:
>>
>>>I have a hobby site that I work on in my spare time. It has a 
>>>forum, chat room, ya da ya da ya da. In it I have created 3 
>>>different search like engines that list resorts, fishing guides and 
>>>bait and tackle shops. (you can check it out if you like, just a 
>>>few months old, but growing  http://MyFishingPals.com   )
>>>
>>>Anyway, I was wondering if anyone has used Witango to create a 
>>>search engine somewhat like google. I have been trying to figure a 
>>>way to do this. The big thing is how to search for certain terms. 
>>>In other words, a "contains" search does exactly as expected, but 
>>>how does one do partial phrase searches and the like. I am not 
>>>worried about stemming (related keywords), just trying to figure 
>>>out how to do this.
>>>
>>>example if searching for "jim" returns "Jim and Bonny's Resort" 
>>>good so far... But
>>>searching for jim's returns nothing.
>>>
>>>And is there a way to use quotes for exact phrase? More like a search 
engine?
>>>
>>>Anyway, just wondering if anyone has done this or pondered this at 
>>>all. Any suggestions would be great.
>>>
>>>Thanks
>>>
>>
>>
>>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
>
>
>-- 
>
>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
>


Bill Conlon

To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306

office: 650.327.2175
fax:650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web:http://www.tothept.com



TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


RE: Witango-Talk: Search Engine Format Type

2004-04-27 Thread Bill Conlon
As one of the other swish-e fanatics, this is a great way to go.

It's a fast, flexible, free, open-source tool that spiders your content 
AND provides the interface to pull the relevant web pages out of the 
index.

You can use the Perl interfaces directly, or a shell script from witango. 
 If you go this way, I'll be glad to help.

>I agree with John's assessment that a search engine is the way to go. 
>
>But if you really want to do it in Witango, here's one possibility:
>
>1) tokenize the input string on space, comma, sq, dq, period, and any
>other punctuation characters. This gives you a [1,x] array of the words.
>Transpose it into a [x,1] array
>
>2) filter the array to eliminate values < 3 characters long.
>
>3) loop through the array to build a sql statement to do your search. 
>
>Something like:
>select id, count(id), max(txt) 
>from table
>where txt like '%Jim%' or 
>txt like '%Bonnie%' or
>txt like '%Resort%'
>group by id
>order by count(id) desc
>
>this will give you an array of the rows that contain the any of the
>words, sorted by the number of times the words appear
>
>
>To give users the ability to quote strings and get an exact match, check
>to see if the first and last characters are sq or dq. If so, strip them
>off and skip the tokenize step.
>
>I haven't actually tried this methodology, but it should work. It may be
>slow though depending on your database server, amount of data and
>indexing.
>
>Another methodology I've worked with many years ago at CBC news was to
>take all the distinct words from every article and insert them into a
>table. Then build a many<->many relationship with the articles to show
>which words appeared in what article. That made for a couple of huge
>tables, but it was well indexed and running on a mainframe so it
>resulted in some fast searches.
>
>Dave
>
>-Original Message-
>From: John McGowan [mailto:[EMAIL PROTECTED] 
>Sent: April 27, 2004 9:53 AM
>To: [EMAIL PROTECTED]
>Subject: Re: Witango-Talk: Search Engine Format Type
>
>if the content you want to search can be "spidered" just install a 
>search engine.  I and others on the list have had great success 
>integrating the swish-e search engine into our Witango apps.
>
>/John
>
>[EMAIL PROTECTED] wrote:
>
>> I have a hobby site that I work on in my spare time. It has a forum, 
>> chat room, ya da ya da ya da. In it I have created 3 different search 
>> like engines that list resorts, fishing guides and bait and tackle 
>> shops. (you can check it out if you like, just a few months old, but 
>> growing  http://MyFishingPals.com   )
>>
>> Anyway, I was wondering if anyone has used Witango to create a search 
>> engine somewhat like google. I have been trying to figure a way to do 
>> this. The big thing is how to search for certain terms. In other 
>> words, a "contains" search does exactly as expected, but how does one 
>> do partial phrase searches and the like. I am not worried about 
>> stemming (related keywords), just trying to figure out how to do this.
>>
>> example if searching for "jim" returns "Jim and Bonny's Resort" good 
>> so far... But
>> searching for jim's returns nothing.
>>
>> And is there a way to use quotes for exact phrase? More like a search 
>> engine?
>>
>> Anyway, just wondering if anyone has done this or pondered this at 
>> all. Any suggestions would be great.
>>
>> Thanks
>>
>
>
>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
>
>
>TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
>


Bill Conlon

To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306

office: 650.327.2175
fax:650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web:http://www.tothept.com



TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


Re: Witango-Talk: Search Engine Format Type

2004-04-27 Thread webdude
The problem I am having looking at canned spidering programs is that 
they spider everything on your site. I need something that will 
search a db (MSSQL) that will spider all the links that are contained 
in the db.

In other words, I have a db with about 500 links to other sites. In 
the db table are things like name, city, lake, description, url. It 
seems these canned programs will spider my site okay, but I don't 
want to search my site, I want to search the db for the links to 
other sites.

if the content you want to search can be "spidered" just install a 
search engine.  I and others on the list have had great success 
integrating the swish-e search engine into our Witango apps.

/John
[EMAIL PROTECTED] wrote:
I have a hobby site that I work on in my spare time. It has a 
forum, chat room, ya da ya da ya da. In it I have created 3 
different search like engines that list resorts, fishing guides and 
bait and tackle shops. (you can check it out if you like, just a 
few months old, but growing  http://MyFishingPals.com   )

Anyway, I was wondering if anyone has used Witango to create a 
search engine somewhat like google. I have been trying to figure a 
way to do this. The big thing is how to search for certain terms. 
In other words, a "contains" search does exactly as expected, but 
how does one do partial phrase searches and the like. I am not 
worried about stemming (related keywords), just trying to figure 
out how to do this.

example if searching for "jim" returns "Jim and Bonny's Resort" 
good so far... But
searching for jim's returns nothing.

And is there a way to use quotes for exact phrase? More like a search engine?
Anyway, just wondering if anyone has done this or pondered this at 
all. Any suggestions would be great.

Thanks

TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf

--

TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


RE: Witango-Talk: Search Engine Format Type

2004-04-27 Thread Dave Shelley
I agree with John's assessment that a search engine is the way to go. 

But if you really want to do it in Witango, here's one possibility:

1) tokenize the input string on space, comma, sq, dq, period, and any
other punctuation characters. This gives you a [1,x] array of the words.
Transpose it into a [x,1] array

2) filter the array to eliminate values < 3 characters long.

3) loop through the array to build a sql statement to do your search. 

Something like:
select id, count(id), max(txt) 
from table
where txt like '%Jim%' or 
txt like '%Bonnie%' or
txt like '%Resort%'
group by id
order by count(id) desc

this will give you an array of the rows that contain the any of the
words, sorted by the number of times the words appear


To give users the ability to quote strings and get an exact match, check
to see if the first and last characters are sq or dq. If so, strip them
off and skip the tokenize step.

I haven't actually tried this methodology, but it should work. It may be
slow though depending on your database server, amount of data and
indexing.

Another methodology I've worked with many years ago at CBC news was to
take all the distinct words from every article and insert them into a
table. Then build a many<->many relationship with the articles to show
which words appeared in what article. That made for a couple of huge
tables, but it was well indexed and running on a mainframe so it
resulted in some fast searches.

Dave

-Original Message-
From: John McGowan [mailto:[EMAIL PROTECTED] 
Sent: April 27, 2004 9:53 AM
To: [EMAIL PROTECTED]
Subject: Re: Witango-Talk: Search Engine Format Type

if the content you want to search can be "spidered" just install a 
search engine.  I and others on the list have had great success 
integrating the swish-e search engine into our Witango apps.

/John

[EMAIL PROTECTED] wrote:

> I have a hobby site that I work on in my spare time. It has a forum, 
> chat room, ya da ya da ya da. In it I have created 3 different search 
> like engines that list resorts, fishing guides and bait and tackle 
> shops. (you can check it out if you like, just a few months old, but 
> growing  http://MyFishingPals.com   )
>
> Anyway, I was wondering if anyone has used Witango to create a search 
> engine somewhat like google. I have been trying to figure a way to do 
> this. The big thing is how to search for certain terms. In other 
> words, a "contains" search does exactly as expected, but how does one 
> do partial phrase searches and the like. I am not worried about 
> stemming (related keywords), just trying to figure out how to do this.
>
> example if searching for "jim" returns "Jim and Bonny's Resort" good 
> so far... But
> searching for jim's returns nothing.
>
> And is there a way to use quotes for exact phrase? More like a search 
> engine?
>
> Anyway, just wondering if anyone has done this or pondered this at 
> all. Any suggestions would be great.
>
> Thanks
>


TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


Re: Witango-Talk: Search Engine Format Type

2004-04-27 Thread John McGowan
if the content you want to search can be "spidered" just install a 
search engine.  I and others on the list have had great success 
integrating the swish-e search engine into our Witango apps.

/John
[EMAIL PROTECTED] wrote:
I have a hobby site that I work on in my spare time. It has a forum, 
chat room, ya da ya da ya da. In it I have created 3 different search 
like engines that list resorts, fishing guides and bait and tackle 
shops. (you can check it out if you like, just a few months old, but 
growing  http://MyFishingPals.com   )

Anyway, I was wondering if anyone has used Witango to create a search 
engine somewhat like google. I have been trying to figure a way to do 
this. The big thing is how to search for certain terms. In other 
words, a "contains" search does exactly as expected, but how does one 
do partial phrase searches and the like. I am not worried about 
stemming (related keywords), just trying to figure out how to do this.

example if searching for "jim" returns "Jim and Bonny's Resort" good 
so far... But
searching for jim's returns nothing.

And is there a way to use quotes for exact phrase? More like a search 
engine?

Anyway, just wondering if anyone has done this or pondered this at 
all. Any suggestions would be great.

Thanks

TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf


Witango-Talk: Search Engine Format Type

2004-04-27 Thread webdude
I have a hobby site that I work on in my spare time. It has a forum, 
chat room, ya da ya da ya da. In it I have created 3 different search 
like engines that list resorts, fishing guides and bait and tackle 
shops. (you can check it out if you like, just a few months old, but 
growing  http://MyFishingPals.com   )

Anyway, I was wondering if anyone has used Witango to create a search 
engine somewhat like google. I have been trying to figure a way to do 
this. The big thing is how to search for certain terms. In other 
words, a "contains" search does exactly as expected, but how does one 
do partial phrase searches and the like. I am not worried about 
stemming (related keywords), just trying to figure out how to do this.

example if searching for "jim" returns "Jim and Bonny's Resort" good 
so far... But
searching for jim's returns nothing.

And is there a way to use quotes for exact phrase? More like a search engine?
Anyway, just wondering if anyone has done this or pondered this at 
all. Any suggestions would be great.

Thanks
--

TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf