i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many
URL's in my sql db not having associated Server commands. here i just tried to
reindex and i see that my URL is being deleted:
# indexer -m -s 200
Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with
'/usr/local/install/mnogosearch-
3.1.9/etc/indexer.conf'
jobs
Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm
Indexer[2397]: [1] No 'Server' command for url... deleted.
ò^C
Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting
more URL's.
here is my full indexer.conf:
---cut---
#Include inc1.conf
DBAddr pgsql://***:*****@/work/
DBMode cache
#SyslogFacility local7
LogdAddr localhost:7000
LocalCharset iso-8859-1
Ispellmode db
StopwordTable stopword
#ServerTable server
DeleteNoServer no
#Allow *
#Disallow NoMatch *.state.mn.us/*
Disallow http://www.rootsweb.com/~mn*
Disallow http://www.wxusa.com/*
Disallow http://www.vitalrec.com/*
Disallow http://*yahoo.com/*
Disallow http://*aol.com/*
Disallow http://www.salescircular.com/*
Disallow http://*.wellsfargo.com/*
# Disallow any except known extensions and directory index using "regex" match:
Disallow NoMatch Regex
\/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a
sp|\.txt$
# Exclude cgi-bin and non-parsed-headers using "string" match:
Disallow */cgi-bin/* *.cgi */nph-*
# Exclude anything with '?' sign in URL. Note that '?' sign has a
# special meaning in "string" match, so we have to use "regex" match here:
#Disallow Regex \?
# Exclude some known extensions using fast "String" match:
Disallow *.b *.sh *.md5 *.rpm
Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2
Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z
Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx
Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat
Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra
Disallow *.vrml *.wrl *.png
Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_
Disallow *.tex *.texi *.xls *.doc *.texinfo
Disallow *.rtf *.pdf *.cdf *.ps
Disallow *.ai *.eps *.ppt *.hqx
Disallow *.cpt *.bms *.oda *.tcl
Disallow *.o *.a *.la *.so
Disallow *.pat *.pm *.m4 *.am *.css
Disallow *.map *.aif *.sit *.sea
Disallow *.m3u *.qt *.mov
# Exclude Apache directory list in different sort order using "string" match:
Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
# More complicated case. RAR .r00-.r99, ARJ a00-a99 files
# and unix shared libraries. We use "Regex" match type here:
Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
#CheckOnly *.b *.sh *.md5
#CheckOnly *.arj *.tar *.zip *.tgz *.gz
#CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z
#CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff
#CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie
#CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff
#CheckOnly *.vrml *.wrl *.png
#CheckOnly *.exe *.cab *.dll *.bin *.class
#CheckOnly *.tex *.texi *.xls *.doc *.texinfo
#CheckOnly *.rtf *.pdf *.cdf *.ps
#CheckOnly *.ai *.eps *.ppt *.hqx
#CheckOnly *.cpt *.bms *.oda *.tcl
#CheckOnly *.rpm *.m3u *.qt *.mov
#CheckOnly *.map *.aif *.sit *.sea
#
# or check ANY except known text extensions using "regex" match:
#Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
#HrefOnly */mail*.html */thread*.html
UseRemoteContentType yes
AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e
AddType text/html *.html *.htm *.m
AddType image/x-xpixmap *.xpm
AddType image/x-xbitmap *.xbm
AddType image/gif *.gif
AddType Regex \.r[0-9][0-9]$
AddType application/unknown *.*
#Mime application/msword "text/plain; charset=cp1251" "catdoc $1"
#Mime application/x-troff-man text/plain "deroff"
#Mime text/x-postscript text/plain "ps2ascii"
Period 6m
#Tag <string>
#Category FFAABBCCDD
MaxHops 56
MaxNetErrors 6
ReadTimeOut 30s
DocTimeOut 1m30s
NetErrorDelayTime 1d
Robots yes
Clones yes
BodyWeight 2
TitleWeight 4
KeywordWeight 8
DescWeight 16
#UrlWeight 16
#UrlHostWeight 8
#Category FFAABBCCDD
MaxHops 56
MaxNetErrors 6
ReadTimeOut 30s
DocTimeOut 1m30s
NetErrorDelayTime 1d
Robots yes
Clones yes
BodyWeight 2
TitleWeight 4
KeywordWeight 8
DescWeight 16
#UrlWeight 16
#UrlHostWeight 8
#UrlPathWeight 8
#UrlFileWeight 0
#IspellCorrectFactor 1
#IspellIncorrectFactor 1
#NumberFactor 1
#AlnumFactor 1
#MinWordLength 1
#MaxWordLength 32
#DeleteBad no
Index yes
Follow path
Server site http://www.state.mn.us/
Server site http://www.exploreminnesota.com/
Server site http://www.tpt.org/
Server page http://www.gorp.com/gorp/location/mn/mn.htm
Server path http://lists.rootsweb.com/index/usa/MN/
#Server site http://www.mallofamerica.com/
#Realm [String|Regex] [Match|NoMatch] <arg> [alias]
Realm http://*.mn.us/*
#Realm http://*
#URL http://localhost/main/index.html
---/cut---
__________________________________________________
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]