Or, unless I'm missing something, there's an -f option to the indexer that
you can use to tell the indexer to read a specific list of URLs from a
file. So if you need to reindex www.foo.com/apple.html and
www.bar.com/orange.html every time the indexer starts up, you could just:
echo "http://www.foo.com/apple.html" > startup_urls.txt
echo "http://www.bar.com/orange.html" >> startup_urls.txt
/path/to/indexer -a -f startup_urls.txt
... at least, that's what I seem to remember. Unfortunately, I haven't
upgraded udmsearch since I contributed a patch to add the -f option way
back around version 2.2.1, so maybe it's not there anymore ;-)
(For my purposes, I *know* that http://www.whatever.com/newfiles.html will
contain a list of newly added files to the sites on my server(s) ... so I
simply tell the indexer to start at http://www.whatever.com/newfiles.html,
reindex whatever documents it finds there, and then give up.)
Cheers,
Charlie
On Thu, 22 Jun 2000, Willem Brown wrote:
> Hi,
>
> I expire those pages manualy before starting the indexer. The
> command is like this.
>
> indexer -a -u %/~lists/%/index.html -u %/~lists/%/mail%.html -u %/~lists/%/thr%.html
>
> The idea is to select certain pages (urls) using -u, to expire, which is
> what the -a option does.
>
> Regards
> Willem Brown
>
> On Thu, Jun 22, 2000 at 12:11:12PM -0700, J C Lawrence wrote:
> > On Thu, 22 Jun 2000 15:46:20 -0300 (ADT)
> > The Hermit Hacker <[EMAIL PROTECTED]> wrote:
> >
> > > Try setting your Period higher and using the -n option to restrict
> > > the number of pages it does in an invocation ...
> >
> > > for instance, set Period to 1week, and -n option to 20k ...
> >
> > > this way it only processes 20k expired pages, and they will only
> > > expire again in a week ...
> >
> > This creates a different problem, and is why I have the Period set
> > low:
> >
> > There are certain pages that must be spidered every index.
> >
> > Specifically, most of my site consists of mailing list archives.
> > The various index pages for the messages in those archives need to
> > be spidered every time to pick up the new messages.
> >
> > Summary:
> >
> > I need a long period to prevent everything being indexed every
> > time, but I also need certain pages to be spidered for new URLs
> > every single time the indexer runs. How to do?
> >
> > --
> > J C Lawrence Home: [EMAIL PROTECTED]
> > ----------(*) Other: [EMAIL PROTECTED]
> > --=| A man is as sane as he is dangerous to his environment |=--
> > ______________
> > If you want to unsubscribe send "unsubscribe udmsearch"
> > to [EMAIL PROTECTED]
> >
>
> --
> /* =============================================================== */
> /* Linux, FreeBSD, NetBSD, OpenBSD. The choice is yours. */
> /* =============================================================== */
>
> "The release of emotion is what keeps us health. Emotionally healthy."
>
> "That may be, Doctor. However, I have noted that the healthy release
> of emotion is frequently unhealthy for those closest to you."
> -- McCoy and Spock, "Plato's Stepchildren", stardate 5784.3
> ______________
> If you want to unsubscribe send "unsubscribe udmsearch"
> to [EMAIL PROTECTED]
>
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]