Re: UdmSearch: Indexing of .pdf files

Thomas Yengst Sat, 25 Nov 2000 11:09:57 -0800
Adam Kaufman wrote:
> 
> Hello,
> 
> I am currently working on indexing a large number of online .pdf files
> using udmsearch's indexer on a Linux platform.  I managed to get the
> indexer to interface fine with the pdf files by modifying the
> indexer.conf file to have it use an opensource pdftotext program.
> Unfortuantely, after I left it running last night to finish the indexing,
> something strange happened.  After completing exactly 999 files, it began
> to display the contents of the pdf files it was indexing to the shell
> from which I was running the indexer rather than inserting the
> info into the udmsearch database.  This occured without any warning,
> error message, or obvious reason.  Has anyone ever encountered this
> problem before, and is there some way to avoid it?  Thanks for any help you
> can give,
> 

the indexer has similar difficulties whenever it has to work on a file
for a long time - perhaps a timeout issue with the http request. I've
noticed the same thing when indexing large Word or PDF files.
Personally, I've found ps2ascii is more efficient and produces better
results than pdftotext. And, it works just as well on postscript files.

The 999 files thing is probably just coincidence. It could have just as
easily been 666.

thomas

-- 
My PGP public key is at
http://wwwkeys.pgp.net:11371/pks/lookup?op=index&search=yengst
Lookup anyone's PGP key at http://www.openpgp.net/pgpsrv.html

Thomas R. Yengst            Photon Research Associates, Inc.
(858) 455-9741              5720 Oberlin Drive
(858) 455-0658 fax          San Diego, CA 92121-1723


______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]
Re: UdmSearch: Indexing of .pdf files

Reply via email to