UdmSearch: Problems indexing pdf docs

Thomas Hepper Tue, 27 Jun 2000 23:00:37 -0700

Hi,

i have a problem indexing pdf docs with udmsearch on HP-UX

The version i use is 3.0.18 (To compile i had to add -ldce to the libs,

without out that it is missing cma_sleep)

OK now the problem. It seems that indexer does not load the complete

document. When i try to feed the document to pdftotext i get the following error

http://149.213.15.15/projekte/int_prae1.pdf
'Allowhttp://149\.213\.15\.15/'
HTTP/1.1 200 OK
Date: Wed, 28 Jun 2000 05:59:19 GMT
Server: Apache/1.3.9 (Unix) mod_perl/1.21 PHP/3.0.12 FrontPage/4.0.4.3 mod_ssl/2.4.5 OpenSSL/0.9.3a
Last-Modified: Tue, 16 May 2000 11:48:02 GMT
ETag: "22e4-a497-39213572"
Accept-Ranges: bytes
Content-Length: 42135
Connection: close
Content-Type: application/pdf
HTTP/1.1 200 OK application/pdf 42135
Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Top-level pages object is wrong type (null)
Error: Couldn't read page catalog

OK, so i made a little script to save the document in tmp

#!bin/sh

TMPFILE=basename`$0`.$$

exec cat > $TMPFILE

The output from the indexer for on document is

http://149.213.15.15/projekte/int_prae1.pdf
'Allowhttp://149\.213\.15\.15/'
HTTP/1.1 200 OK
Date: Wed, 28 Jun 2000 05:45:31 GMT
Server: Apache/1.3.9 (Unix) mod_perl/1.21 PHP/3.0.12 FrontPage/4.0.4.3 mod_ssl/2.4.5 OpenSSL/0.9.3a
Last-Modified: Tue, 16 May 2000 11:48:02 GMT
ETag: "22e4-a497-39213572"
Accept-Ranges: bytes
Content-Length: 42135
Connection: close
Content-Type: application/pdf
HTTP/1.1 200 OK application/pdf 42135

So i expect to get 42135 byte, but the file is actual

1057 byte

and this happens for every pdf document.

So any ideas/tips ?

Thanks

Thomas

+----------------------------------------------------------+
Thomas Hepper
Tel. : +(49)-511-645-3464
Mail : [EMAIL PROTECTED]
+----------------------------------------------------------+

UdmSearch: Problems indexing pdf docs

Reply via email to