Hi Erik, I've shared the file on dropbox, which you can access via the link here: https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0
This is what I get from the Tika app after dropping the file in. Content-Length: 75092 Content-Type: application/pdf Type: COSName{Info} X-Parsed-By: org.apache.tika.parser.DefaultParser X-TIKA:digest:MD5: de67120e29ec7ffa24aec7e17104b6bf X-TIKA:digest:SHA256: d0f04580d87290c1bc8068f3d5b34d797a0d8ccce2b18f626a37958c439733e7 access_permission:assemble_document: true access_permission:can_modify: true access_permission:can_print: true access_permission:can_print_degraded: true access_permission:extract_content: true access_permission:extract_for_accessibility: true access_permission:fill_in_form: true access_permission:modify_annotations: true dc:format: application/pdf; version=1.3 pdf:PDFVersion: 1.3 pdf:encrypted: false producer: null resourceName: Desmophen+670+BAe.pdf xmpTPg:NPages: 3 Regards, Edwin On 17 December 2015 at 00:15, Erik Hatcher <erik.hatc...@gmail.com> wrote: > Edwin - Can you share one of those PDF files? > > Also, drop the file into the Tika app and see what it sees directly - get > the tika-app JAR and run that desktop application. > > Could be an encoding issue? > > Erik > > — > Erik Hatcher, Senior Solutions Architect > http://www.lucidworks.com <http://www.lucidworks.com/> > > > > > On Dec 16, 2015, at 10:51 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > > > Hi, > > > > I'm using Solr 5.3.0 > > > > I'm indexing some PDF documents. However, for certain PDF files, there > are > > chinese text in the documents, but after indexing, what is indexed in the > > content is either a series of "??????" or an empty content. > > > > I'm using the post.jar that comes together with Solr. > > > > What could be the reason that causes this? > > > > Regards, > > Edwin > >