Hi,
i have a problem indexing pdf docs with udmsearch on HP-UX
The version i use is 3.0.18 (To compile i had to add -ldce to the libs,
without out that it is missing cma_sleep)
 
OK now the problem. It seems that indexer does not load the complete
document. When i try to feed the document to pdftotext i get the following error
 
http://149.213.15.15/projekte/int_prae1.pdf
'Allowhttp://149\.213\.15\.15/'
HTTP/1.1 200 OK
Date: Wed, 28 Jun 2000 05:59:19 GMT
Server: Apache/1.3.9 (Unix) mod_perl/1.21 PHP/3.0.12 FrontPage/4.0.4.3 mod_ssl/2.4.5 OpenSSL/0.9.3a
Last-Modified: Tue, 16 May 2000 11:48:02 GMT
ETag: "22e4-a497-39213572"
Accept-Ranges: bytes
Content-Length: 42135
Connection: close
Content-Type: application/pdf
HTTP/1.1 200 OK application/pdf 42135
Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Top-level pages object is wrong type (null)
Error: Couldn't read page catalog
OK, so i made a little script to save the document in tmp
#!bin/sh
TMPFILE=basename`$0`.$$
exec cat > $TMPFILE
 
The output from the indexer for on document is
http://149.213.15.15/projekte/int_prae1.pdf
'Allowhttp://149\.213\.15\.15/'
HTTP/1.1 200 OK
Date: Wed, 28 Jun 2000 05:45:31 GMT
Server: Apache/1.3.9 (Unix) mod_perl/1.21 PHP/3.0.12 FrontPage/4.0.4.3 mod_ssl/2.4.5 OpenSSL/0.9.3a
Last-Modified: Tue, 16 May 2000 11:48:02 GMT
ETag: "22e4-a497-39213572"
Accept-Ranges: bytes
Content-Length: 42135
Connection: close
Content-Type: application/pdf
HTTP/1.1 200 OK application/pdf 42135
So i expect to get 42135 byte, but the file is actual
1057 byte
and this happens for every pdf document.
 
So any ideas/tips ?
 
Thanks
    Thomas
+----------------------------------------------------------+
  Thomas Hepper                                          
  Tel. : +(49)-511-645-3464                              
  Mail : [EMAIL PROTECTED]                                   
+----------------------------------------------------------+     

Reply via email to