# ./indexer -ma -u http://www.mnpage.com/%
AddServer 'http://www.state.mn.us/' 17
AddServer 'http://www.mnworkforcecenter.org/' 17
AddServer 'http://www.exploreminnesota.com/' 17
AddServer 'http://www.tpt.org/' 17
AddServer 'http://www.gorp.com/gorp/location/mn/mn.htm' 17
AddServer 'http://lists.rootsweb.com/index/usa/MN/' 17
AddServer 'http://*.mn.us/*' 18
AddServer '(null)' 17
Indexer[11168]: indexer from mnogosearch-3.1.9/PgSQL started with
'/usr/local/install/mnogosearch
-3.1.9/etc/indexer.conf'
Indexer[11168]: [1] http://www.mnpage.com/
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/aboutus.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/chat.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/classads.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/disability.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/edu.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/entertainment.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/events.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/family.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/food.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/fun.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/google.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
Indexer[11168]: [1] http://www.mnpage.com/greeting.html
Indexer[11168]: [1] No 'Server' command for url... deleted.
--- Alexander Barkov <[EMAIL PROTECTED]> wrote:
> This patch will not fix the problem. The problem is not here.
> "DeleteNoServer no" is implemented via adding one virtual emtpy
> server 
> after loading indexer.conf. It means that if there is no other
> correspondent
> Server or Realm commands for some URL, indexer will find the last one
> empty server and will execute something like this:
> 
>    strncmp(url,Server[i].url,strlen(Server[i].url))
> 
>  where Server[i].url is an empty string. So, any URL will pass this
> condition. 
> 
> 
>  I can't reproduce the same unexpected behaviour on my box,
> To debug it please check two things:
> 
> 1. function UdmAddServer in the file server.c
> 
>    add as a first statement:
> 
>       printf("AddServer '%s' %d\n",srv->url,match_type);
> 
>    and check that an empty string appeare in the output after all
> "Server" 
> arguments given in indexer.conf
> 
> 
> Then if the above works fine
> 
> 2. Add debugging output into UdmFindServer function. I think it is
> clean
> enough
> how does it work.
> 
> 
> 
> Caffeinate The World wrote:
> > 
> > alex or serge, could you look over this patch? i believe this patch
> > should fix this problem described below:
> > 
> > ---cut---
> > # diff -ru indexer.c.orig indexer.c
> > --- indexer.c.orig      Tue Jan 30 10:45:03 2001
> > +++ indexer.c   Tue Jan 30 10:47:29 2001
> > @@ -368,7 +368,7 @@
> >         }
> > 
> >         /* Find correspondent Server record from indexer.conf */
> > -      
> if(!(CurSrv=UdmFindServer(Indexer->Conf,Doc->url,aliastr))){
> > +       if((!(CurSrv=UdmFindServer(Indexer->Conf,Doc->url,aliastr))
> &&
> > (!CurSrv->delete_no_server
> > ))){
> >                 UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command
> for
> > url... deleted.");
> >                 if(!strcmp(CurURL.filename,"robots.txt")){
> > 
> >
> if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo)))
> > ---/cut---
> > 
> > --- Caffeinate The World <[EMAIL PROTECTED]> wrote:
> > > i reported this back in 3.1.9pre13. i have 'DeleteNoServer no'
> set
> > > with many
> > > URL's in my sql db not having associated Server commands. here i
> just
> > > tried to
> > > reindex and i see that my URL is being deleted:
> > >
> > > # indexer -m -s 200
> > > Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with
> > > '/usr/local/install/mnogosearch-
> > > 3.1.9/etc/indexer.conf'
> > > jobs
> > > Indexer[2397]: [1]
> > > http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm
> > > Indexer[2397]: [1] No 'Server' command for url... deleted.
> > > ò^C
> > > Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it
> from
> > > deleting
> > > more URL's.
> > >
> > > here is my full indexer.conf:
> > >
> > > ---cut---
> > > #Include inc1.conf
> > >
> > > DBAddr          pgsql://***:*****@/work/
> > > DBMode cache
> > > #SyslogFacility local7
> > > LogdAddr localhost:7000
> > > LocalCharset iso-8859-1
> > > Ispellmode db
> > > StopwordTable stopword
> > >
> > > #ServerTable server
> > >
> > > DeleteNoServer no
> > >
> > > #Allow *
> > >
> > > #Disallow NoMatch *.state.mn.us/*
> > > Disallow http://www.rootsweb.com/~mn*
> > > Disallow http://www.wxusa.com/*
> > > Disallow http://www.vitalrec.com/*
> > > Disallow http://*yahoo.com/*
> > > Disallow http://*aol.com/*
> > > Disallow http://www.salescircular.com/*
> > > Disallow http://*.wellsfargo.com/*
> > > # Disallow any except known extensions and directory index using
> > > "regex" match:
> > > Disallow NoMatch Regex
> > >
> >
>
\/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a
> > > sp|\.txt$
> > > # Exclude cgi-bin and non-parsed-headers using "string" match:
> > > Disallow */cgi-bin/* *.cgi */nph-*
> > > # Exclude anything with '?' sign in URL. Note that '?' sign has a
> > > # special meaning in "string" match, so we have to use "regex"
> match
> > > here:
> > > #Disallow Regex  \?
> > >
> > > # Exclude some known extensions using fast "String" match:
> > > Disallow *.b    *.sh   *.md5  *.rpm
> > > Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z     *.bz2
> > > Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
> > > Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm
> > > *.pcx
> > > Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
> > > Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
> > > Disallow *.vrml *.wrl  *.png
> > > Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
> > > Disallow *.tex  *.texi *.xls  *.doc  *.texinfo
> > > Disallow *.rtf  *.pdf  *.cdf  *.ps
> > > Disallow *.ai   *.eps  *.ppt  *.hqx
> > > Disallow *.cpt  *.bms  *.oda  *.tcl
> > > Disallow *.o    *.a    *.la   *.so
> > > Disallow *.pat  *.pm   *.m4   *.am   *.css
> > > Disallow *.map  *.aif  *.sit  *.sea
> > > Disallow *.m3u  *.qt   *.mov
> > >
> > > # Exclude Apache directory list in different sort order using
> > > "string" match:
> > > Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
> > >
> > > # More complicated case. RAR .r00-.r99, ARJ a00-a99 files
> > > # and unix shared libraries. We use "Regex" match type here:
> > > Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
> > >
> > > #CheckOnly *.b    *.sh   *.md5
> > > #CheckOnly *.arj  *.tar  *.zip  *.tgz  *.gz
> > > #CheckOnly *.lha  *.lzh  *.rar  *.zoo  *.tar*.Z
> > > #CheckOnly *.gif  *.jpg  *.jpeg *.bmp  *.tiff
> > > #CheckOnly *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie
> > > #CheckOnly *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff
> > > #CheckOnly *.vrml *.wrl  *.png
> > > #CheckOnly *.exe  *.cab  *.dll  *.bin  *.class
> > > #CheckOnly *.tex  *.texi *.xls  *.doc  *.texinfo
> > > #CheckOnly *.rtf  *.pdf  *.cdf  *.ps
> > > #CheckOnly *.ai   *.eps  *.ppt  *.hqx
> > > #CheckOnly *.cpt  *.bms  *.oda  *.tcl
> > > #CheckOnly *.rpm  *.m3u  *.qt   *.mov
> > > #CheckOnly *.map  *.aif  *.sit  *.sea
> > > #
> > > # or check ANY except known text extensions using "regex" match:
> > > #Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
> > >
> > > #HrefOnly */mail*.html */thread*.html
> > >
> > > UseRemoteContentType yes
> > >
> > > AddType text/plain      *.txt  *.pl *.js *.h *.c *.pm *.e
> > > AddType text/html       *.html *.htm *.m
> > > AddType image/x-xpixmap *.xpm
> > > AddType image/x-xbitmap *.xbm
> > > AddType image/gif       *.gif
> > > AddType Regex \.r[0-9][0-9]$
> > > AddType application/unknown *.*
> > >
> > > #Mime application/msword       "text/plain; charset=cp1251"  
> "catdoc
> > > $1"
> > > #Mime application/x-troff-man  text/plain
> > > "deroff"
> > > #Mime text/x-postscript        text/plain
> > > "ps2ascii"
> > >
> > > Period 6m
> > > #Tag <string>
> > > #Category FFAABBCCDD
> > > MaxHops 56
> > > MaxNetErrors 6
> > > ReadTimeOut 30s
> 
=== message truncated ===


__________________________________________________
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]

Reply via email to