Re: UdmSearch: parsing words with "-" in them

Craig Small Mon, 28 Aug 2000 20:51:29 -0700

On Mon, Aug 28, 2000 at 04:23:38PM -0700, Steve Quezadas wrote:
> I am using udmsearch to crawl my website and I noticed that the indexer program 
>separates words with a dash ("-"). In my case, I have many web pages with model 
>numbers for products (MVC-FD88 for example) so udmsearch is indexing "MVC" as one 
>word and "FD88" as another. 
> 
> I looked through the C code to try to find the section of code responsible for 
>separating these words when it inserts them into table [dict]. I don't know that much 
>about C or C++, but I figure the change is probably simlpe enough for me to make. 
>Does anyone know what C file is responsible for separating the words like that?

I believe the way the indexer works is repeated calls to UdmGetWord
which basically runs along a string to find if a character is in or
not in a set.

Have a look at charset.c, especially the WORDCHAR definition.
My guess is add - to WORDCHAR.  One of the programmers may have a
better idea.

  - Craig
-- 
Craig Small VK2XLZ  GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/        <[EMAIL PROTECTED]>
MIEEE <[EMAIL PROTECTED]>                 Debian developer <[EMAIL PROTECTED]>
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]

Re: UdmSearch: parsing words with "-" in them

Reply via email to