Given the following URL:
http://www.kanga.nu/archives/MUD-Dev-L/2000Q2/index.php
Where indexer is configured to honour meta/robots tags, the HTML at
that page contains the following meta line:
<META NAME="robots" CONTENT="noindex,follow">
as I don't want that page indexed, but I do want it spidered, and
where the top three archived messages listed and their URLs are
__NOT__ present in UdmSearch'es index, and the following indexer
command line is executed:
indexer -N 10 -a -u %/archives/%`date +%Y`%/index.php -e -n 20000
Why aren't the three new messages indexed?
Output from indexer:
Indexer[14442]: indexer from UdmSearch v.3.0.18/MySQL started with
'/usr/local/udmsearch/etc/indexer.conf'
Indexer[14444]: [1] http://www.kanga.nu/archives/IRead-L/2000Q2/index.php
Indexer[14445]: [2] http://www.kanga.nu/archives/MUD-Dev-L/2000Q2/index.php
Indexer[14447]: [4] http://www.kanga.nu/archives/MUD-Dev-L/2000Q1/index.php
Indexer[14448]: [5] http://www.kanga.nu/archives/Meta-L/2000Q2/index.php
Indexer[14450]: [7] http://www.kanga.nu/archives/Meta-L/2000Q1/index.php
Indexer[14445]: [2] Done
Indexer[14452]: [9] Done
Indexer[14446]: [3] Done
Indexer[14453]: [10] Done
Indexer[14449]: [6] Done
Indexer[14447]: [4] Done
Indexer[14444]: [1] Done
Indexer[14448]: [5] Done
Indexer[14450]: [7] Done
Indexer[14451]: [8] Done
Which is totally correct asides from not noticing the three new URLs
and indexing them. This is supported by the apache logs BTW which
report only:
bush.kanga.nu - - [24/Jun/2000:00:07:54 -0700] "GET
/archives/IRead-L/2000Q2/index.php HTTP/1.0" 200 2412 "-" "UdmSearch/3.0.18" 1
www.kanga.nu
bush.kanga.nu - - [24/Jun/2000:00:07:54 -0700] "GET
/archives/MUD-Dev-L/2000Q1/index.php HTTP/1.0" 200 143686 "-" "UdmSearch/3.0.18" 1
www.kanga.nu
bush.kanga.nu - - [24/Jun/2000:00:07:54 -0700] "GET
/archives/Meta-L/2000Q2/index.php HTTP/1.0" 200 84618 "-" "UdmSearch/3.0.18" 0
www.kanga.nu
bush.kanga.nu - - [24/Jun/2000:00:07:54 -0700] "GET
/archives/Meta-L/2000Q1/index.php HTTP/1.0" 200 10740 "-" "UdmSearch/3.0.18" 0
www.kanga.nu
bush.kanga.nu - - [24/Jun/2000:00:07:54 -0700] "GET
/archives/MUD-Dev-L/2000Q2/index.php HTTP/1.0" 200 158 "-" "UdmSearch/3.0.18" 1
www.kanga.nu
Ideas?
--
J C Lawrence Home: [EMAIL PROTECTED]
----------(*) Other: [EMAIL PROTECTED]
--=| A man is as sane as he is dangerous to his environment |=--
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]