Re: [htdig] Inherent limitaitions of htdig

2000-10-03 Thread Geoff Hutchison
On Tue, 3 Oct 2000, Eric Bliss wrote: > Does anybody know of any inherent limitations between htdig, an Intel > Pentium 3 class architecture, a Red Hat Linux kernel, and roughly > 50,000 files containing roughly 650 megs of data being indexed? I'm None. I'd only see a limitation if you said "oh

[htdig] Inherent limitaitions of htdig

2000-10-03 Thread Eric Bliss
Does anybody know of any inherent limitations between htdig, an Intel Pentium 3 class architecture, a Red Hat Linux kernel, and roughly 50,000 files containing roughly 650 megs of data being indexed? I'm having no end of troubles with this, and htdig isn't the first search engine we've had prob

Re: [htdig] Last modified date revisited - Apache

2000-10-03 Thread Peaveway
It looks like a neverending story for you... In einer eMail vom 03.10.00 20:18:27 (MEZ) - Mitteleurop. Sommerzeit schreibt [EMAIL PROTECTED]: > From what I can glean, there are 2 ways to get this. > > Either by putting an echo command into the html files (SSI), Wich echo command? The s

Re: [htdig] from IIS searching to HtDig

2000-10-03 Thread Geoff Hutchison
On Tue, 3 Oct 2000, Frances Santiago wrote: > # CiScope is the directory (virtual or real) under which results are > # returned. If a file matches the query but is not in a directory beneath > # CiScope, it is not returned in the result set. > # A scope of / means all hits matching the query are

Re: [htdig] Last modified date revisited - Apache

2000-10-03 Thread Gilles Detillieux
According to Roger Weiss: > From what I can glean, there are 2 ways to get this. > > Either by putting an echo command into the html files (SSI), > or by setting xbithack=full and setting the executable bits on for group and > user. No, the XBitHack turns .html files with execute permission into

Re: [htdig] ... but not changed

2000-10-03 Thread Gilles Detillieux
According to David Adams: > When, during an update run, htdig says of a page: "retrieved but not > changed", how does htdig decide that the page is the same as the last time? > > An author is maintaining that she added a link to a page and that an update > run of htdig failed to follow the new li

[htdig] from IIS searching to HtDig

2000-10-03 Thread Frances Santiago
I am switching a site running on MS IIS to Unix Apache. In order to make the transition as seamless as possible I will need htdig to mirror the way the IIS search engine works. In the .idq file the following is listed: # CiScope is the directory (virtual or real) under which results are # returne

Re: [htdig] ... but not changed

2000-10-03 Thread Geoff Hutchison
On Tue, 3 Oct 2000, David Adams wrote: > When, during an update run, htdig says of a page: "retrieved but not > changed", how does htdig decide that the page is the same as the last time? It checks the date it received from the server (if present) against the date in the database. If they're the

Re: [htdig] Last modified date revisited - Apache

2000-10-03 Thread Geoff Hutchison
On Tue, 3 Oct 2000, Roger Weiss wrote: > Neither approach is good for us. We have many static html pages, and many > more being created every day. > It's not feasible for us to put the extra code into each html file, or to > change the x bit on each file as well. > I'm going to be completely h

[htdig] Last modified date revisited - Apache

2000-10-03 Thread Roger Weiss
>From what I can glean, there are 2 ways to get this. Either by putting an echo command into the html files (SSI), or by setting xbithack=full and setting the executable bits on for group and user. Neither approach is good for us. We have many static html pages, and many more being created ever

[htdig] ... but not changed

2000-10-03 Thread David Adams
A simple query (I hope). When, during an update run, htdig says of a page: "retrieved but not changed", how does htdig decide that the page is the same as the last time? An author is maintaining that she added a link to a page and that an update run of htdig failed to follow the new link(s) she

Re: [htdig] Problems with PDF files

2000-10-03 Thread David Adams
> > Hello, > > > > > At 10:06 AM +0200 10/3/00, Martin Mielke wrote: > > >Error (0): PDF file is damaged - attempting to reconstruct > > xref table... > > >Error: Couldn't find trailer dictionary > > >Error: Couldn't read xref table > > >Error (0): PDF file is damaged - attempting to reconstru

RE: [htdig] Problems with PDF files

2000-10-03 Thread Martin Mielke
Hello, > > At 10:06 AM +0200 10/3/00, Martin Mielke wrote: > >Error (0): PDF file is damaged - attempting to reconstruct > xref table... > >Error: Couldn't find trailer dictionary > >Error: Couldn't read xref table > >Error (0): PDF file is damaged - attempting to reconstruct > xref table... >

Re: [htdig] Analyzer script for access_log

2000-10-03 Thread Bill Carlson
On Tue, 3 Oct 2000, Charles Nepote wrote: > And may be a better choice than Webalizer as it can give stats on search > words (which Webalizer cannot). Webalizer does support search words, as of 1.3.0 at least. Haven't looked at the latest versions of analog (stuck on an ancient version), but Web

Re: [htdig] Question about Htdig's database

2000-10-03 Thread Geoff Hutchison
At 10:40 AM +0100 10/3/00, Adam Rice wrote: >Geoff Hutchison wrote: > > > Second question -- can I use such a database from my own Perl script? > > > > Can you use a Berkeley DB? Sure, use the DBI interface--it should be part > > of your Perl 5 installation. > >No. Well, maybe there's a DBI dr

Re: [htdig] Problems with PDF files

2000-10-03 Thread Geoff Hutchison
At 10:06 AM +0200 10/3/00, Martin Mielke wrote: >Error (0): PDF file is damaged - attempting to reconstruct xref table... >Error: Couldn't find trailer dictionary >Error: Couldn't read xref table >Error (0): PDF file is damaged - attempting to reconstruct xref table... >Error: Couldn't find traile

Re: [htdig] Analyzer script for access_log

2000-10-03 Thread Kapil Biyani
Has anybody tired summary... A little costly, but being shareware can use it for a month to test it and believe me, you won't feel like using anything else after summary. Gives you around 118+ reports and more included in new version. check it out http://www.summary.net the author, Jason T. Linh

[htdig] Analyzer script for access_log

2000-10-03 Thread Charles Nepote
Todd Wallace ([EMAIL PROTECTED]) asked : > Does anyone have a nice analyzer script for the access_log that apache > produces? Preferably a Perl script. > > Thanks, > Todd Wallace I think Analog is a good choice (it is not a perl script). http://www.analog.cx And may be a better choice than

[htdig] Problems with PDF files

2000-10-03 Thread Martin Mielke
Dear all, indexing the database using a crontab, generates (short) emails like this: --8<--8<--8<-- Error (0): PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table Error (0): PDF file is damaged - attempting to rec