"Paul A. Rombouts" <[EMAIL PROTECTED]> writes:

> > There is no one special file that contains all of the magic to enable
> > the program to work.  Even better there is no single file that will
> > cause the program to fail if it gets lost or corrupted.
> 
> Should my url database file get completely lost or corrupted, this will only
> effect the ability the generate index pages, not the ability to read pages
> offline or write then online. When I run "wwwoffle -purge" a copy of the old 
> url
> database is kept as a backup, so the most likely scenario in case of a failure
> is that I loose information about some of the most recent URLs visited (the
> names, not the content), but certainly not everything.

What is it that you are trying to optimise, time (to create indexes)
or space (of all the U* files on disk)?

If it is the time to create the index pages then any speedup from
your scheme would be lost to me.  I don't know how typical my use of
WWWOFFLE is compared to other people though.  I looked through my log
files for the last whole week and found the following information:

Total URLs requested:     39714 (about 12000 of these are htdig running)
WWWOFFLE internal pages:   3189 (about 2500 of these are htdig running)
WWWOFFLE index pages:        40

With the way that I use WWWOFFLE this means that I only requested 40
index pages out of about 27000 pages in total (excluding htdig
running).  This means that any increase in speed from the database of
U* files would save me no time.

If it is space on the disk that you are saving then there is no
significant saving compared to combining the U* and D* files in one
single file.  There will statistically be some saving with all of the
U* data in a single file, but this is probably only 10% better than
putting the U* data into the D* files.


I also did some optimisations recently (a few months ago actually)
which you will all be able to enjoy in version 2.9.  By profiling the
source code (counting how many times each line of code is executed) I
found where the problems were and tried eliminating them.  This
involved quite a lot of re-writing and re-organising of the code.  The
speedup that I achieved was a few percent (5%?) for every page.

There were extra optimisations that speeded up the handling of every
page that is modified (the enable-modify-html and disable-animated-gif
options) and every internally generated page.  This was achieved with
an extra layer of buffering between WWWOFFLE internals and writing to
the browser.  This resulted in about a 10% speedup for these types of
pages.

In version 2.7 the results of the HTML or GIF modifications or
internal pages were written to a temporary file and then this was sent
to the browser all in one go.  This had a large delay because the user
would see nothing until the whole page was modified.  In version 2.8
the HTML modifications were performed inline so that this delay was
removed.  This seemed to cause the web browser to use lots more CPU
and slowed down the page display.  I think that this was because
WWWOFFLE was writing lots of very small network packets to the browser
which is not efficient.  With the version 2.9 code there is slightly
more delay (but not noticable) and a lot fewer packets.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html

Reply via email to