Hi,

I took a quick look at what UDMSearch is doing on insert as
I have around 100000 documents to index and each takes
about 2 seconds.  Just quickly, I am using MySQL and
CRC-multi.

There are two things I noticed which the UDMSearch
developers might consider.  The first is kind of
questionable and that is relating to locking.  Indexer
gets a write lock on the url table but doesn't lock the
various ndict tables when inserting.  This yeilds about a
10% performance gain over here but means searches are
slower when indexing.  Perhapse an option would be
appropriate.

The other problem I noticed was the ndict inserting itself. 
A document with 1000 indexed words generates 1000 inserts
into the various ndict tables.  You could bundle them and
use MySQL's extended insert capability here.  EG. get all
the two letter words and insert into an array of words or
crc values, do the same for the rest of the tables and
insert at the end.  This would be fairly trivial on
non-multi tables but might mean some code for the multi
ones.  You'd just have to watch that the query didn't get
too big, possibly deviding into smaller parts if it did. 
On a test table, this gave huge performance increases but
ymmv.

I am by no means a MySQL guru so I'd be interested in
whether you think it would bring any gains.

Cheers,
Shane

-- 
Shane Wegner: [EMAIL PROTECTED]
              http://www.cm.nu/~shane/
PGP:          1024D/FFE3035D
              A0ED DAC4 77EC D674 5487
              5B5C 4F89 9A4E FFE3 035D

PGP signature

Reply via email to