TextIndexNG 3.1.1 Zope 2.8.0 Python 2.3.5 What attribute should be specified when indexing PDFs? I've been using "data". Word docs are indexed properly, but the PDFs aren't. The PDFs are still found with the rest of the files, but the indexed content is not what I expected.
To try narrow things down, I set up a seperate test Catalog with only two PDFs. The number of distinct values for indexing these PDFs is around 6600 (which seems a little high for two pdfs with a combined total of 3 pages). In the Catalog tab of my test ZCatalog, the PDFs are listed as type "Unknown". The content type of these PDFs are set to "application/pdf'". (In my other ZCatalog, the PDFs and Word docs are listed as type "File") This is an excerpt from the vocabulary for "f" in my test Catalog's index: ------------------------- f f+æq f0 f2ök f5ô f6 f7ëfü fa false fb8aad1ed82a2cc33e9feb68a3f323 fbt fc fd fdo fe fea feâà ff fg fgiëü fh fib filter filters firstchar fió fl flags flatedecode fm fmx fnaèh font fontbbox fontdescriptor fontfamily fontfile2 fontname fontstretch fontweight footlight format ------------------------- It looks as though the converter isn't doing its job, or the index isn't recognizing the files as PDFs I have manually run pdftotext at the command line with each of the PDFs to see if pdftotext is having trouble and it appears to output the textual content properly. The TextIndexNG Converters tab does recognize it. Do I have a misconfiguration somewhere? Thanks! _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )