On Mon, Aug 28, 2000 at 04:23:38PM -0700, Steve Quezadas wrote:
> I am using udmsearch to crawl my website and I noticed that the indexer program
>separates words with a dash ("-"). In my case, I have many web pages with model
>numbers for products (MVC-FD88 for example) so udmsearch is indexing "MVC" as one
>word and "FD88" as another.
>
> I looked through the C code to try to find the section of code responsible for
>separating these words when it inserts them into table [dict]. I don't know that much
>about C or C++, but I figure the change is probably simlpe enough for me to make.
>Does anyone know what C file is responsible for separating the words like that?
I believe the way the indexer works is repeated calls to UdmGetWord
which basically runs along a string to find if a character is in or
not in a set.
Have a look at charset.c, especially the WORDCHAR definition.
My guess is add - to WORDCHAR. One of the programmers may have a
better idea.
- Craig
--
Craig Small VK2XLZ GnuPG:1C1B D893 1418 2AF4 45EE 95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/ <[EMAIL PROTECTED]>
MIEEE <[EMAIL PROTECTED]> Debian developer <[EMAIL PROTECTED]>
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]