Re: [htdig] Indexing a given list of file

2001-01-18 Thread Geoff Hutchison
On Thu, 18 Jan 2001, Gilles Detillieux wrote: > There was talk of adding to the 3.2 code a feature whereby you can tell > htdig not to recheck all the indexed documents, but only check a given > list of URLs. I don't remember if this feature is already in the current > development snapshots. Ye

Re: [htdig] Indexing a given list of file

2001-01-18 Thread Gilles Detillieux
According to Loys Masquelier: > I want to check that it is not possible to index a list of changed files > without reindexing all the data. > In fact the situation is that I know that that list of files needs to be > reindexed and I want to do that as fast as possible. You may be out of luck with

Re: [htdig] indexing htdig.org

2001-01-10 Thread Gilles Detillieux
According to Tracey Guzouskas: > In the CONFIG file in the original directory where ${prefix}, ${DEST} and > CONFIG_DIF are defined the correct path is entered to my htdig.conf file, > however if I do not use the -c /my/path/here command I receive the error > "htdig: Unable to find configuration f

Re: [htdig] indexing htdig.org

2001-01-10 Thread Tracey Guzouskas
CTED]> Sent: Wednesday, January 10, 2001 4:50 PM Subject: Re: [htdig] indexing htdig.org > According to Tracey Guzouskas: > > thanks for the quick response. I have 'limit_urls_to: " set to ${start_url}, > > which is pointing to my start.url file and there is no referen

Re: [htdig] indexing htdig.org

2001-01-10 Thread Gilles Detillieux
racey Guzouskas <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Sent: Wednesday, January 10, 2001 3:34 PM > Subject: Re: [htdig] indexing htdig.org > > > > Maybe you have a link to htdig.org somewhere in your web > > content? Check your "limits_urls

Re: [htdig] indexing htdig.org

2001-01-10 Thread Tracey Guzouskas
ay, January 10, 2001 3:34 PM Subject: Re: [htdig] indexing htdig.org > Maybe you have a link to htdig.org somewhere in your web > content? Check your "limits_urls_to" attribute to see if you are > restricting the digging to the domains you want to index. > > Tracey Guzou

Re: [htdig] indexing htdig.org

2001-01-10 Thread Peterman, Timothy P
Maybe you have a link to htdig.org somewhere in your web content? Check your "limits_urls_to" attribute to see if you are restricting the digging to the domains you want to index. Tracey Guzouskas wrote: > > Hello, > > I am using version htdig-3.1.5. I have already ran rundig once on my > site

Re: [htdig] Indexing german pages

2001-01-03 Thread Gilles Detillieux
According to Radoy Pavlov: > I have some questions regarding german language. > Following the example in FAQ I've made my htdig.conf, > extracted GermanWords.zip in $COMMON_DIR/german and edited htdig.conf. > I've done this: > rerun of rundig > rerun of htfuzzy endings > Still htdig cant find any

Re: [htdig] Indexing a database

2001-01-03 Thread SMantscheff
Thanks a lot for your fast and informative reply - we do not find this kind of support in products sold for impressive prices.. Best regards s.m. To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to co

Re: [htdig] Indexing problems

2001-01-03 Thread peter karlsson
Geoff Hutchison: > Does your server send a Last-Modified-Since: header? (Or put another > way, are many of these documents server-parsed or other dynamic > content?) It is a more-or-less standard-configured Apache. I do, however, use content negotiation on many of the pages (including the front

Re: [htdig] Indexing problems

2001-01-02 Thread Geoff Hutchison
At 6:03 AM +0100 1/3/01, peter karlsson wrote: >I seem to have some problem with htdig indexing my site; it does index >and update the pages it initially indexed, but it never finds any new >pages, even if they are linked from the previously indexed pages. Does your server send a Last-Modified-Si

Re: [htdig] Indexing a database

2001-01-02 Thread Geoff Hutchison
On Mon, 1 Jan 2001, SMantscheff wrote: > 1) How does htDig recognize that a document has changed? By a meta > tag? By its content? By a header? What information do I have to send > so that htDig knows the document's time? By the Last-Modified: header sent by the server. If a date exists in the d

Re: [htdig] indexing mySQL table

2000-12-22 Thread Steve Knoblock
Yeah, I was very surprised when I made a Delete function in a web application a GET link and ran htdig over my site. It deleted everything. I inserted a POST form delete confirmation in between because I wanted to present this function as a link. But generally, I make any function that is not as t

Re: [htdig] Indexing Restricted Pages

2000-12-21 Thread Douglas Kline
Thanks for your information. We are looking into the things you have mentioned. Doug Kline > You can allow access without a password from a given IP address or > host name(s), while requiring a password from everywhere else. > > This is what "satisfy any" (previously mentioned by someone e

Re: [htdig] Indexing Restricted Pages

2000-12-21 Thread Albert Lunde
> > >Thanks for your suggestion. Is is possible to use an .htaccess file to > > >restrict access by username? > So is it then not possible to use the .htaccess file to permit access to the > Web pages without username and password by just the htdig process or just one > username's processes while

Re: [htdig] Indexing Restricted Pages

2000-12-21 Thread Douglas Kline
Thanks to all for your responses. > At 7:39 PM -0500 12/20/00, Douglas Kline wrote: > >Thanks for your suggestion. Is is possible to use an .htaccess file to > >restrict access by username? So is it then not possible to use the .htaccess file to permit access to the Web pages without username

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Geoff Hutchison
At 7:39 PM -0500 12/20/00, Douglas Kline wrote: >Thanks for your suggestion. Is is possible to use an .htaccess file to >restrict access by username? Well, this is the point of authentication methods. You could certainly make a username/password pair for htdig alone. Or, as Dave Salisbury ment

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Douglas Kline
> Hi Dave and Douglas: > > I'm a htdig newbie too. I haven't configured it yet in our production > server. But I think I have a simple solution that may work. > > Why don't you create a username for the htdig software and allow access to > this user to the protected directories to create the dat

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Ing. Noel Vargas Baltodano
Hi Dave and Douglas: I'm a htdig newbie too. I haven't configured it yet in our production server. But I think I have a simple solution that may work. Why don't you create a username for the htdig software and allow access to this user to the protected directories to create the database? -- N

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Dave Salisbury
> > > > > > > > require valid-user > > order allow, deny > > deny from all > > allow from .your.domain.com > > > > > > or some such permutation. > > > also add a "satisfy any" in the Directroy block. > > > Thanks for the suggestion, Dave. If we do that, if I understand this > correctly,

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Douglas Kline
> > I'd imagine you might be able to do something along these lines: > > > > require valid-user > order allow, deny > deny from all > allow from .your.domain.com > > > or some such permutation. > also add a "satisfy any" in the Directroy block. Thanks for the suggestion, Dave. If we do t

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Dave Salisbury
Oh yea.. also add a "satisfy any" in the Directroy block. ta DS - Original Message - From: Douglas Kline <[EMAIL PROTECTED]> To: Geoff Hutchison <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Wednesday, December 20, 2000 4:39 PM Subject: Re: [htdi

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Dave Salisbury
PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Wednesday, December 20, 2000 4:39 PM Subject: Re: [htdig] Indexing Restricted Pages > > On Wed, 20 Dec 2000, Douglas Kline wrote: > > > > > The htdig search engine doesn't reach Web pages which require a Web login a >

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Douglas Kline
> On Wed, 20 Dec 2000, Douglas Kline wrote: > > > The htdig search engine doesn't reach Web pages which require a Web login a nd > > password for user access. Is there a way to get it to do so, either by > > See the htdig program page: > or the authorization

Re: [htdig] Indexing Restricted Pages

2000-12-20 Thread Geoff Hutchison
On Wed, 20 Dec 2000, Douglas Kline wrote: > The htdig search engine doesn't reach Web pages which require a Web login and > password for user access. Is there a way to get it to do so, either by See the htdig program page: or the authorization attribute

Re: [htdig] indexing mySQL table

2000-12-16 Thread Geoff Hutchison
At 7:23 PM +0800 12/14/00, Zon Hisham Bin Zainal Abidin wrote: >1)htdig will index: http://domain/result.php?cat=AUT&subcat=AUT-CAR > http://domain/result.php?cat=AUT&subcat=AUT-BIKE > etc etc > >2)result.php looks inside my content table AND then refer to (

Re: [htdig] indexing mySQL table

2000-12-15 Thread Zon Hisham Bin Zainal Abidin
Geoff Hutchison wrote: (..snip..) > So if you have, for example, a web frontend to your mySQL > database (e.g. a PHP-generated set of pages or whatnot), then you > just set ht://Dig to index the frontend. > YES. It will definitely work...But .. NO. That is NOT what MOST ppl want :( My fronten

Re: [htdig] indexing mySQL table

2000-12-12 Thread Joshua Gerth
> At 4:35 PM +0800 12/8/00, Zon Hisham Bin Zainal Abidin wrote: > >Can htdig index mySQL tables? > > I don't remember if someone else already replied to you. In answer to > your question, yes, but... The caveat is that ht://Dig indexes > content. So if you have, for example, a web frontend to

Re: [htdig] indexing mySQL table

2000-12-12 Thread Geoff Hutchison
At 4:35 PM +0800 12/8/00, Zon Hisham Bin Zainal Abidin wrote: >Can htdig index mySQL tables? I don't remember if someone else already replied to you. In answer to your question, yes, but... The caveat is that ht://Dig indexes content. So if you have, for example, a web frontend to your mySQL d

Re: [htdig] indexing dem cyrillic letters along w/ latin ones

2000-12-11 Thread Gilles Detillieux
According to Max Pyziur: > Sometime around the end of 1999 there was a Ukrainian dictionary which appeared > on a server in Ukraine. It is in the KOI8 encoding. You can find it here: > ftp://cad.ntu-kpi.kiev.ua/soft/lingvist/UkrIspell/ > or here: > http://www.physics.mcgill.ca/WWW/oleh/emacs/i

Re: [htdig] indexing dem cyrillic letters along w/ latin ones

2000-12-09 Thread Max Pyziur
over a year and a 1/2 ago the dialog went thusly: > > According to Max Pyziur: > >Greetings All, > > > >I'm still a newbie to ht://dig. I've installed it both on my home Linux > >box (RPMs on RedHat 5.2) and on our server (running Solaris 2.6; yes, had > >to find the necessary libstdc++ library

Re: [htdig] Indexing never ends ...

2000-12-05 Thread Geoff Hutchison
On Wed, 6 Dec 2000, Zon Hisham Bin Zainal Abidin wrote: > I ran the indexing at 11pm last nite and it's still not finish at 8am > this morning. There are only 20 categories in the category table, 120 > subcategories in the subcategory table, 15 states in the state table and > 152 towns in the tow

Re: [htdig] Indexing PDF Files

2000-11-01 Thread Gilles Detillieux
If that still doesn't solve the problem, try running conv_doc.pl (or even pdftotext) directly on some of your problem PDF files. I suspect that these files contain no indexable text, but only images, which is a common problem with some PDFs. You also didn't mention how you installed htdig on you

Re: [htdig] Indexing PDF Files

2000-11-01 Thread Geoff Hutchison
On Wed, 1 Nov 2000, Roy Stephane wrote: > When I perform rundig in verbose mode, I find that htdig recognise all my > PDF files, it shows theire size. After that, when htmerge find a PDF, it say > that there is no excerpt, so the file (temporary file) is deleted. You haven't told us how verbose

Re: [htdig] Indexing PDF Files

2000-11-01 Thread creep
Use conv_doc.pl instead of parse_doc get it from http://www.htdig.org/files/contrib/parsers/conv_doc.pl.gz gunzip it and move it to /usr/local/bin get xpdf from ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.91.tgz get ps2ascii from your freetype or ghostscript installation put this in your conf/htdig

Re: [htdig] Indexing past the question mark

2000-10-25 Thread Geoff Hutchison
At 8:22 PM -0700 10/25/00, Ed Greenberg wrote: > http://www.greenberg.org/publish?lpcalifornia+index > >My initial attempts indicate that the spider did not follow these >links. I can't find anything in htdig.conf to indicate that it should. There were some problems with the URL handling in

Re: [htdig] Indexing Flash (was Including Pull-Down Menu Pages)

2000-10-20 Thread Geoff Hutchison
At 5:33 PM -0700 10/20/00, [EMAIL PROTECTED] wrote: >One search service (atomz.com) will deal with raw Flash files. I do >keep mentioning this issue to my colleagues who write search >engines, but without luck so far. We actually have a contributed Flash parser, though it obviously needs test

Re: [htdig] Indexing a whole site by parts

2000-10-10 Thread Vladimir Sánchez O.
> On Tue, 10 Oct 2000, Vladimir Sánchez O. wrote: > > > I've tried using the LINK tag in the header of the page, but it doesn't > > work. > > What version of ht://Dig are you using? I'm using 3.1.5. > > My question: Is it possible to index subdirectory by subdirectory (by > > hand)and then to ru

Re: [htdig] Indexing a whole site by parts

2000-10-10 Thread Geoff Hutchison
On Tue, 10 Oct 2000, Vladimir Sánchez O. wrote: > I've tried using the LINK tag in the header of the page, but it doesn't > work. What version of ht://Dig are you using? > My question: Is it possible to index subdirectory by subdirectory (by > hand)and then to run htdig or another application t

Re: [htdig] Indexing a site with multiple usernames and passwords

2000-09-28 Thread Gilles Detillieux
According to Brad Nicholson: > I have a web site I want to index, and there are several different areas > protected by .htaccess files. The problem is that there is not a consitant > username and password accross them all. Is there a way to supply multiple > usernames and passwords to htdig, or d

Re: [htdig] Indexing only one archive

2000-09-28 Thread Kapil Biyani
One easiest way I find is adding all the other extions in the bad_extensions attrib. and make the start_url and limit_to_urls attrib directory same. See if it works. take care. A proud/happy user of htdig. ;-) Signing off Kaps \\ \\\ | /// // \\ \\ | /

Re: [htdig] Indexing only one archive

2000-09-28 Thread Frank Altpeter
Hello ! Jose Antonio Gómez wrote on 28.09.2000 17:40:58 +0200: > hello? > > Is it posible to index only one unix text archive, not one URL? How can > I do it? > Thanks. hmmm ? What do you mean with unix text archive ? Some single file ? If so, just define http://www.webserver.com/path/to/file.t

Re: [htdig] Indexing a site with multiple usernames and passwords

2000-09-28 Thread Frank Altpeter
Hello ! Brad Nicholson wrote on 28.09.2000 09:24:53 -0600: > Hi, > > I have a web site I want to index, and there are several different areas > protected by .htaccess files. The problem is that there is not a consitant > username and password accross them all. Is there a way to supply multiple

Re: [htdig] Indexing a site with multiple usernames and passwords

2000-09-28 Thread Torsten Neuer
Brad Nicholson wrote: > > Hi, > > I have a web site I want to index, and there are several different areas > protected by .htaccess files. The problem is that there is not a consitant > username and password accross them all. Is there a way to supply multiple > usernames and passwords to htdig,

Re: [htdig] Indexing URLs

2000-09-27 Thread Gilles Detillieux
According to Vincent Queru: > But I still have one more question : I had included a META NAME="robots" > VALUE=noindex" tag in the page containing the links but they still got indexed, is > that normal ? > > Furthermore, it is not the link description that got indexed but the link itself > (ie.

Re: [htdig] Indexing URLs

2000-09-27 Thread Vincent Queru
Torsten Neuer wrote: > Vincent Queru wrote: > > > > But I still have one more question : I had included a META NAME="robots" > > VALUE=noindex" tag in the page containing the links but they still got indexed, is > > that normal ? > > Yes. You will achieve the intended behaviour with > In fact

Re: [htdig] Indexing URLs

2000-09-27 Thread Vincent Queru
Gilles Detillieux wrote: > According to Vincent Queru: > > Some time ago, I read that someone wanted to index not only the HTML > > source but also the URLs that the robot comes across when indexing a > > site. > > > > I DO NOT want to index the URLs but unfortunately, they get indexed : is > > t

Re: [htdig] Indexing URLs

2000-09-26 Thread Gilles Detillieux
According to Vincent Queru: > Some time ago, I read that someone wanted to index not only the HTML > source but also the URLs that the robot comes across when indexing a > site. > > I DO NOT want to index the URLs but unfortunately, they get indexed : is > there something I missed here ? htdig d

Re: [htdig] Indexing a whole site

2000-09-21 Thread Torsten Neuer
"Vladimir Sánchez O." wrote: > > Hi, I'm using htdig and I have a problem: hoe to index my whole site. I > have a homepage with a Macromedia Flash menu and no links to other pages > in the site, so when I run htdig I index nothing but the homepage. > Is there something I can set in htdig.conf to

Re: [htdig] Indexing new pages *only*

2000-09-20 Thread Geoff Hutchison
On Wed, 20 Sep 2000, Martin Mielke wrote: > So, the question: is it possible to index new documents only? I know for > sure that the old docs are still there and reindexing everything from > scratch with rundig takes more minutes everytime so it's difficult to put it > on a cron... If you do not

RE: [htdig] Indexing stops after 100-200 documents

2000-09-19 Thread Geoff Hutchison
At 11:29 AM +0200 9/19/00, Martin Mielke wrote: >playing around with the conf file, a trailing slash at the end of start_url >solved the issue: > > 1. start_url: http://intranet/ --> fails to index documents > 2. start_url: http://intranet --> indexes everything! :-) It sounds like t

RE: [htdig] Indexing stops after 100-200 documents

2000-09-19 Thread Martin Mielke
And hello one more time, playing around with the conf file, a trailing slash at the end of start_url solved the issue: 1. start_url: http://intranet/ --> fails to index documents 2. start_url: http://intranet --> indexes everything! :-) Is this an "intermitent" problem in 3.1.5

Re: [htdig] Indexing

2000-09-15 Thread Geoff Hutchison
At 10:01 PM +0200 9/14/00, Huba Zsolt wrote: >Geoff Hutchison wrote: > > > > At 10:32 PM +0200 9/5/00, Huba Zsolt wrote: > > >I would like to use htdig provided SuSE distribution. I've read the > > >documentation, but I can't solve my problem. I would like to index the > > >root directory of

Re: [htdig] Indexing files added to a directory

2000-09-15 Thread Geoff Hutchison
At 2:45 PM +0300 9/15/00, Peter Peltonen wrote: > > http://www.apache.org/docs/mod/mod_autoindex.html > >Could you send an example configuration of Apache that uses the >mod_autoindex? No offense, but that's IMHO off-topic for this list. The Apache documentation is online as indicated and at le

Re: [htdig] Indexing files added to a directory

2000-09-15 Thread Peter Peltonen
Geoff Hutchison wrote: > Well there are lots of ways, including whipping up a small script to > generate a list of URLs from the directory listing. Or, if you use > Apache, you can use mod_autoindex. (Most webservers offer something > similar.) > > http://www.apache.org/docs/mod/mod_autoindex.ht

Re: [htdig] Indexing files added to a directory

2000-09-14 Thread Geoff Hutchison
At 4:23 PM -0400 9/14/00, Prasad Subramanian wrote: >rundig to index these files. That is, it seems like only those files are >indexed that have a hyperlink to it somewhere. Is this true? This is correct. To quote from the documentation "In this process, t

Re: [htdig] Indexing

2000-09-09 Thread Geoff Hutchison
At 10:32 PM +0200 9/5/00, Huba Zsolt wrote: >I would like to use htdig provided SuSE distribution. I've read the >documentation, but I can't solve my problem. I would like to index the >root directory of my webserver and its subdirectories. I can index only >my root directory. What should I write

Re: [htdig] indexing files in directories

2000-09-04 Thread Geoff Hutchison
At 5:34 PM +0200 9/4/00, [EMAIL PROTECTED] wrote: >Is it possible to index plain files in directories (HTML, JSPs, PDFs) using >ht://Dig? All information I found, was related to ht://Dig as a spider / >robot resolving links. You can index files in directories if you have a server that generates

Re: [htdig] Indexing a list of sites -- catching failures

2000-08-30 Thread Geoff Hutchison
On Tue, 29 Aug 2000 [EMAIL PROTECTED] wrote: > >> Some sites seem to cause htdig to fail. When this happens, htdig doesn't > >> continue on with the rest of the list -- it simply skips to the next step > >> in rundig. This means that I have to do some careful adding and substracting > >> to the

Re: [htdig] Indexing a list of sites -- catching failures

2000-08-29 Thread christopher . murtagh
On Tue, 29 Aug 2000 [EMAIL PROTECTED] wrote: >>Todd Wallace wrote: >> >> Some sites seem to cause htdig to fail. When this happens, htdig doesn't >> continue on with the rest of the list -- it simply skips to the next step >> in rundig. This means that I have to do some careful adding and subs

Re: [htdig] Indexing a list of sites -- catching failures

2000-08-29 Thread D . J . Adams
> > I have set up htdig so that every night, it indexes a long list of small > web sites. In general, this works very well, but I've found that I have to > be very careful when adding new sites to the end of the list. > > Some sites seem to cause htdig to fail. When this happens, htdig doesn't

Re: [htdig] Indexing a list of sites -- catching failures

2000-08-25 Thread Geoff Hutchison
On Fri, 25 Aug 100 [EMAIL PROTECTED] wrote: > Some sites seem to cause htdig to fail. When this happens, htdig doesn't I'm not sure what you mean by "fail." In some cases, htdig may not index a site, i.e. the site is unreachable, the robots.txt forbids it, the webserver returns no data, etc. Bu

Re: [htdig] Indexing php3 files

2000-08-17 Thread Gilles Detillieux
According to Vishal Shah: > I am having problems indexing .php3 files. The error I am getting is : > > Rejected: URL not in the limits! > > Any suggestions ? I don't have .php3 in the list of exclude URLs in my > conf file. That error message indicates the URL is outside of the bounds specified

Re: [htdig] Indexing stops at 1060 documents

2000-08-09 Thread Gilles Detillieux
According to Brian Paulson: > When I run HTDIG it indexes fine tell it gets to 1060 documents each time. > > Any one have any Ideas? Well, there's certainly nothing magic about the number 1060 that makes htdig stop there, unless you've set server_max_docs to that number. Could you perhaps be a

Re: [htdig] Indexing intranet and internet site.

2000-08-03 Thread Douglas S. Davis
Yup, that is definitely the problem with having them on the same box. Answer: Use a really stupid and arcane name for the config file of the Intranet site. This pre-supposed that those who have access to search abilities on the internal network won't dig through your source code and give this i

RE: [htdig] Indexing MANY directories

2000-07-31 Thread Geoff Hutchison
At 1:26 PM -0400 7/31/00, Roger Weiss wrote: >We're using Java Web Server, and I'm not aware of any files with built-in >dir listings. >However, wouldn't I need more than a dir listing? Wouldn't they have to be >in url (hyperlink) format so htdig could follow it? Webservers like Apache (in partic

RE: [htdig] Indexing MANY directories

2000-07-31 Thread Roger Weiss
c: '[EMAIL PROTECTED]' Subject: Re: [htdig] Indexing MANY directories On Mon, 31 Jul 2000, Roger Weiss wrote: > Under the main url, there are 20+ (category) directories, and under each > of these directories are the user directories (many thousands). > Within the user directories a

Re: [htdig] Indexing MANY directories

2000-07-31 Thread Geoff Hutchison
On Mon, 31 Jul 2000, Roger Weiss wrote: > Under the main url, there are 20+ (category) directories, and under each > of these directories are the user directories (many thousands). > Within the user directories are the html files I want to index. The first thing to remember is that htdig follows

Re: [htdig] Indexing/searching for version numbers and dir paths

2000-07-13 Thread Gilles Detillieux
According to Tim Leggett: > Thank you--I was thinking about extra_word_characters, but didn't complete > the thought. > > So, I add a slash and period to extra_word_characters (I already have the > underscore there). Should I remove these chars from valid_punctuation (as > I've done with the peri

RE: [htdig] Indexing/searching for version numbers and dir paths

2000-07-13 Thread Tim Leggett
Thank you--I was thinking about extra_word_characters, but didn't complete the thought. So, I add a slash and period to extra_word_characters (I already have the underscore there). Should I remove these chars from valid_punctuation (as I've done with the period)? So, my params would be: extra_wo

Re: [htdig] Indexing/searching for version numbers and dir paths

2000-07-13 Thread Gilles Detillieux
According to Tim Leggett: > We recently began using htdig to index our document archive, which is > composed of several hundred pdf docs. One problem--popular searches will be > for version numbers, such as 5.0, 4.0.0.29, and so on. From searching the > htdig list archive, I see that setting allow

Re: [htdig] indexing problems

2000-06-24 Thread Geoff Hutchison
At 1:37 PM -0400 6/24/00, Steven Lax wrote: >When I tested with www.htdig.org, it indexed fine, as it did with my main >host. (The one defined in the body of the httpd.conf rather than within a >VirtualHost directive) This is obviously a problem with your server config. Try connecting using a re

Re: [htdig] Indexing problems - maybe virtual host issue.

2000-06-19 Thread Geoff Hutchison
At 10:15 AM -0400 6/19/00, <[EMAIL PROTECTED]> wrote: > 1:0:http://test.mydomain.com/ >New server: test.mydomain.com, 80 >Unable to build connection with test.mydomain.com:80 > pushed >pick: test.dundasjafine.com, # servers = 1 As it said, it can't build a connection, so it's a problem o

Re: [htdig] Indexing files behind a proxy server ???

2000-06-16 Thread Geoff Hutchison
At 11:30 AM +0100 6/16/00, Abdul Jabbar wrote: >Rejected: URL not in the limits! >url rejected: (level 1)http://www.strath.ac.uk/ [snip] >I think it is looking at the proxy server and is ignoring the site >that i want it to index. Any ideas on how to get around this. P

Re: [htdig] indexing over and over again

2000-06-08 Thread Geoff Hutchison
At 11:37 PM -0400 6/8/00, Clint Gilders wrote: >verbose mode without the -i switch and point it to a new URL I notice >that it still retrieves the previously indexed pages from other URLs >but doesn't index them. Right, without the -i flag, it's in "update mode." So it checks the dates on all t

Re: [htdig] Indexing large amount of non-related files

2000-05-26 Thread Marcel Hicking
Ah, well, sure, too easy to hide it ;-) find /www/htdocs/ -name *.htm* -type f | sed 's/\/www\/htdocs/htt p:\/\/www\.yourdomain\.com/' > /where/ever/you/need/it/allfiles.list Limits the filetype to any *.htm* files (and ignores directories named "foo.html") so you don't end up with tons of image

Re: [htdig] Indexing question.

2000-05-25 Thread Geoff Hutchison
On Thu, 25 May 2000, Wayne Fool wrote: > number-sample number) The IES files index correctly, but the pdf files are > rejected when the database is built. > > [snip] > parameter to true. The htdig -vvv indicates that they are parsed, but they > do not show up. I read the FAQ on this and tho

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Marcel Hicking
On 24 May 00, at 8:06, Geoff Hutchison wrote: [...] > The 3.2 code has a file:// access > method, which would be idea for this purpose. One missing feature--it > doesn't auto-generate directory listings on-the-fly. Of course if > someone would be willing to write this, then indexing collections

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Geoff Hutchison
At 3:35 PM +0200 5/24/00, Stephane Bortzmeyer wrote: >Regular expressions. If they exist in ht://Dig and if I missed them, this may >be because the query language does not seem documented in >http://www.htdig.org/. (The only relevant documentation seems to be >http://www.htdig.org/hts_method.html

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Stephane Bortzmeyer
On Wednesday 24 May 2000, at 8 h 12, the keyboard of Geoff Hutchison <[EMAIL PROTECTED]> wrote: > >(ht://Dig is good for IBM and Quantum) and its advanced request > >language but it was simply not up to the task. > > What features of the glimpse request language do you think are > missing in

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Geoff Hutchison
At 11:59 AM +0200 5/24/00, Stephane Bortzmeyer wrote: >glimpse crashed when the Web server became too large. And it's non >free and is no longer maintained. I miss its very small indexes >(ht://Dig is good for IBM and Quantum) and its advanced request >language but it was simply not up to the t

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Geoff Hutchison
At 12:54 PM +0200 5/24/00, Marcel Hicking wrote: >I have been doing this for a much smaller site: >I have set up a little shell script to generate >a list with all available files and send it through >sed to convert local paths to http://...-URLs. >ht://dig is set up with start_url=allfiles.list

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Marcel Hicking
Since I dont't have a document referring all files to be indexed, I'm thinking of generating a start_url file "on the fly". I have been doing this for a much smaller site: I have set up a little shell script to generate a list with all available files and send it through sed to convert local pat

Re: [htdig] Indexing large amount of non-related files

2000-05-24 Thread Stephane Bortzmeyer
On Tuesday 23 May 2000, at 18 h 51, the keyboard of "Marcel Hicking" <[EMAIL PROTECTED]> wrote: > Anyone has experience with indexing > a large amount of files? My largest one is 150 000 files and growing. > We have been using glimpse so far, glimpse crashed when the Web server became too l

Re: [htdig] Indexing large amount of non-related files

2000-05-23 Thread Geoff Hutchison
At 6:51 PM +0200 5/23/00, Marcel Hicking wrote: >I have at about 200,000 plain text files >spread over a few 100, maybe 1000, directories. >File size is between a few bytes and, sometimes, >above 1mb. All in all this ends up in 1.2gb >of data, growing daily. The files do not >contain HTML code and

RE: [htdig] Indexing binary files by filename

2000-05-22 Thread Geoff Hutchison
At 10:28 AM +0100 5/22/00, Darrell Berry wrote: > > Now, with external converter > > support, the job is even easier. Here's an example: > >in what version of htdig did this functionality appear? From (under 3.1.4) The external_parsers attribute is now ext

RE: [htdig] Indexing binary files by filename

2000-05-22 Thread Darrell Berry
> Now, with external converter > support, the job is even easier. Here's an example: in what version of htdig did this functionality appear? thnx To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to c

Re: [htdig] indexing pages the begin with numbers

2000-05-21 Thread Torsten Neuer
Evan Cooch wrote: > > I'm using HyperNews to manage several online discussion forums, and would > like to configure htDig to search through the forum pages. However, > Hypernews stores a lot of things as .HTML files beginning with a number (in > fact, just as numbers, files like 1.html, 2.html, 3

Re: [htdig] Indexing binary files by filename

2000-05-20 Thread Gilles Detillieux
According to Geoff Hutchison: > At 4:56 PM +0100 5/19/00, Darrell Berry wrote: > >"Indexing binary files by filename (simply need to write a minimal parser > >for this)" > > > >its on the todo list---can i cast my vote for it happening soon? we have a > >site which is about 50% text documents and

Re: [htdig] Indexing binary files by filename

2000-05-19 Thread Geoff Hutchison
At 4:56 PM +0100 5/19/00, Darrell Berry wrote: >"Indexing binary files by filename (simply need to write a minimal parser >for this)" > >its on the todo list---can i cast my vote for it happening soon? we have a >site which is about 50% text documents and 50% quicktime movies, soundfiles >etc, and

Re: [htdig] Indexing news articles ?

2000-05-15 Thread Gilles Detillieux
According to Vincent Royer: > As you can see above, there's an index.html file containing > relatives links to news articles. The index.html page is > correctly indexed but none of the articles. Moreover, apache > use the MIME type message/news when news articles > are browsed. Any idea ? Yes, if

Re: [htdig] Indexing news articles ?

2000-05-15 Thread Vincent Royer
As you can see above, there's an index.html file containing relatives links to news articles. The index.html page is correctly indexed but none of the articles. Moreover, apache use the MIME type message/news when news articles are browsed. Any idea ? Thanks. althes64:/var/spool/news/vuln/cli/i

Re: [htdig] Indexing news articles ?

2000-05-15 Thread Geoff Hutchison
At 3:42 PM +0300 5/15/00, Vincent Royer wrote: >Hi, >in files named by a number without any extension (1, 2, etc ...). >An index.html page contains links to these articles. >Although you can set valid and bad extentions in the configuration file, > >is there a way to index files whithout any exten

Re: [htdig] indexing a bi-lingual site

2000-05-11 Thread Gilles Detillieux
According to Gerard GACHELIN: > I'd like to index a bilingual site (french and english) with htdig 3.1.5. > > english and french data are mixed. > > What is the best way to do this ? Indexing the site should be easy, as long as your system supports locales correctly. Set your locale in htdig.c

Re: [htdig] indexing pdf files

2000-05-11 Thread D . J . Adams
> > I've spend the whole day trying... > This is what I have in my htdig.conf file > > external_parsers: application/pdf /export/home/htdig/bin/parse_doc.pl > > This is what I have in my parse_doc.pl > > $CATPDF = /export/home/xpdf/bin/pdftotext"; ^ | Double

Re: [htdig] indexing pdf files

2000-05-10 Thread Geoff Hutchison
At 4:11 PM -0700 5/10/00, Sid Wilroy wrote: >I've spend the whole day trying... >This is what I have in my htdig.conf file > >external_parsers: application/pdf /export/home/htdig/bin/parse_doc.pl > >This is what I have in my parse_doc.pl > >$CATPDF = /export/home/xpdf/bin/pdftotext"; ^

Re: [htdig] indexing pdf files

2000-05-10 Thread David Robley
On 10 May, Sid Wilroy wrote: > I've spend the whole day trying... > This is what I have in my htdig.conf file > > external_parsers: application/pdf /export/home/htdig/bin/parse_doc.pl > > This is what I have in my parse_doc.pl > > $CATPDF = /export/home/xpdf/bin/pdftotext"; > $PDFINFO = "/expor

Re: [htdig] indexing multiple web sites

2000-05-03 Thread Geoff Hutchison
On Wed, 3 May 2000, atta dubson wrote: > now *that* sounds more like it. you mean -m on htdig, right? will this > override start_url and set limit_urls_to to -m as well? what would be the This sets start_url. It also sets max_hop_conut to 0. It does not set limit_urls_to. > proper way for m

Re: [htdig] indexing multiple web sites

2000-05-03 Thread atta dubson
On Wed, 3 May 2000, Geoff Hutchison wrote: > As they would say in New England "yah can't get there from here." > > In 3.1.5 if you want one database, you'll have to accept that updates > will occur on all URLs. doesn't sound like fun when indexing 400+ sites, most of which are very static, whi

Re: [htdig] indexing multiple web sites

2000-05-03 Thread Geoff Hutchison
At 6:09 AM -0500 5/3/00, atta dubson wrote: >i want to index several web sites and have them stored in one db.* set of >files, yet i would like to at times reindex only certain sites and update >only that part of the database (or add an additional site to an existing >database) without having to r

  1   2   3   >