Hi Andrew,
* Andrew M. Bishop <[EMAIL PROTECTED]> [10. Jan. 2004]:
> Gregor Zattler <[EMAIL PROTECTED]> writes:
>
> > I want to have a new full index of wwwoffles cache:
> > /var/chache/wwwoffle/search/htdig/{tmp,db,db-lasttime} are empty.
> > /var/chache/wwwoffle/search/htdig/scripts/wwwoffle-htdig-full,
> > precicely "htdig -i -c search/htdig/conf/htdig-full.conf" then
> > hangs endless, when indexing of *.pdf, *.doc etc. is enabled but
> > proceedes well, if not.
O. K. I disabled indexing of PDFs because I at least wanted an index of
html files. But htdig hangs in either case...
> > I assume there is a problem with a single specific *.pdf, *.doc
> > file or such. I would like to remove it to have a full index.
> > How can I find out which files were opend last?
>
> When you are using the wwwoffle-htdig-full script to do the indexing
> of the WWWOFFLE cache with htdig there is a log file created. The
> file is called search/htdig/wwwoffle-htdig.log in the WWWOFFLE cache
> directory. This will tell you what the last file was that htdig
> looked at and also it should have all htdig error messages.
Thank you for this hint, but it didn't help.
I'm running Debian/Sid, so wwwoffle-htdig.log is in /var/log.
wwwoffle-htdig-full is empty:
# ls -Al /var/log/wwwoffle-htdig.log
-rw-r--r-- 1 root root 0 2004-01-18 09:15 /var/log/wwwoffle-htdig.log
My wwwoffle cache isn't that big:
# du -sm /var/cache/wwwoffle/http/
1215 /var/cache/wwwoffle/http
In last months a wwwoffle-htdig-full run completed in ca. 4
hours. Now it hangs forever:
# ps fax|grep tty6
561 tty6 S 0:02 -bash
740 tty6 S 0:00 \_ /bin/sh search/htdig/scripts/wwwoffle-htdig-full
741 tty6 R 1457:55 \_ htdig -i -c search/htdig/conf/htdig-full.conf
But the files in /var/cache/wwwoffle/search/htdig/db were last updated
more than 24 hours ago:
# ls -Al /var/cache/wwwoffle/search/htdig/db
total 378612
-rw-r--r-- 1 root root 179151872 2004-01-17 16:11 db.docdb
-rw-r--r-- 1 root root 208151269 2004-01-17 16:11 db.wordlist
This time I enabled indexing of *.pdf *.doc and the like. A few
days ago I run wwwoffle-htdig-full with indexing *.pdf *.doc
disabled. Same result: htdig hangs, db files are not updated any
more...
Do you have any idea how to debug this?
Thanx,
Gregor
P.S.:
It's not a problem of ressources:
top - 18:09:13 up 2 days, 6:10, 9 users, load average: 2.00, 2.00, 2.22
Tasks: 83 total, 8 running, 75 sleeping, 0 stopped, 0 zombie
Cpu(s): 99.9% user, 0.1% system, 0.0% nice, 0.0% idle
Mem: 515940k total, 505904k used, 10036k free, 80872k buffers
Swap: 875500k total, 181708k used, 693792k free, 84192k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
741 root 20 0 23116 732 676 R 49.9 0.1 1457:41 htdig
1791 root 20 0 7560 7288 608 R 49.9 1.4 32:09.04 dpkg
15292 grfz 10 0 1048 1048 828 R 0.2 0.2 0:02.83 top
1 root 8 0 476 444 420 S 0.0 0.1 0:05.02 init
2 root 9 0 0 0 0 S 0.0 0.0 0:02.56 keventd