Hi,
i have a problem indexing pdf docs with udmsearch
on HP-UX
The version i use is 3.0.18 (To compile i had to
add -ldce to the libs,
without out that it is missing
cma_sleep)
OK now the problem. It seems that indexer does not
load the complete
document. When i try to feed the document to
pdftotext i get the following error
http://149.213.15.15/projekte/int_prae1.pdf
'Allowhttp://149\.213\.15\.15/' HTTP/1.1 200 OK Date: Wed, 28 Jun 2000 05:59:19 GMT Server: Apache/1.3.9 (Unix) mod_perl/1.21 PHP/3.0.12 FrontPage/4.0.4.3 mod_ssl/2.4.5 OpenSSL/0.9.3a Last-Modified: Tue, 16 May 2000 11:48:02 GMT ETag: "22e4-a497-39213572" Accept-Ranges: bytes Content-Length: 42135 Connection: close Content-Type: application/pdf HTTP/1.1 200 OK application/pdf 42135 Error (0): PDF file is damaged - attempting to reconstruct xref table... Error: Top-level pages object is wrong type (null) Error: Couldn't read page catalog OK, so i made a little script to save the document
in tmp
#!bin/sh
TMPFILE=basename`$0`.$$
exec cat > $TMPFILE
The output from the indexer for on document
is
http://149.213.15.15/projekte/int_prae1.pdf
'Allowhttp://149\.213\.15\.15/' HTTP/1.1 200 OK Date: Wed, 28 Jun 2000 05:45:31 GMT Server: Apache/1.3.9 (Unix) mod_perl/1.21 PHP/3.0.12 FrontPage/4.0.4.3 mod_ssl/2.4.5 OpenSSL/0.9.3a Last-Modified: Tue, 16 May 2000 11:48:02 GMT ETag: "22e4-a497-39213572" Accept-Ranges: bytes Content-Length: 42135 Connection: close Content-Type: application/pdf HTTP/1.1 200 OK application/pdf 42135 So i expect to get 42135 byte, but the file is
actual
1057 byte
and this happens for every pdf
document.
So any ideas/tips ?
Thanks
Thomas
+----------------------------------------------------------+
Thomas Hepper Tel. : +(49)-511-645-3464 Mail : [EMAIL PROTECTED] +----------------------------------------------------------+ |