Neil Fincham wrote:
> 
> I have compiled the new version 3.1.5 and have made a shell script that
> index's 100 file's & then split's them.

Consider to run splitter not so often. It requires resources and time
to process all low-level files. So it is does not actually metter how
many documents to distribute 100 or 10000. Distributing all of 
4.3 millions documents on aspseek.com takes about 1.5 hours with
3 simultanious splitters.


>  It seam's to be taking a lot of
> disk space in the var/tree/ directory, is this correct?  How much disk space
> does your 4.3 million page database take?

Statistics looks like this:

Total volume of indexed documents (SELECT sum(docsize) FROM url)
is 42 Gb. That means average doc size is about 10K.

Total size of XX.log files is 20 Gb. That means less than half of
original size. Note that you may delete logs after running splitter.
Or gzip them for backup purposes.

Total size of tree after distributing logs is 8.4 Gb. That means
8,4/42=0.2 of original volume.


SQL version is not so effitient. For example 223 Gb of original
documents on http://search.udm.net takes 78 Gb word indexes. So
the ratio is 78/223=0.34


-- 
Alexander Barkov
IZHCOM, Izhevsk
email:    [EMAIL PROTECTED]      | http://www.izhcom.ru
Phone:    +7 (3412) 51-32-11 | Fax: +7 (3412) 51-20-80
ICQ:      7748759
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]

Reply via email to