See if they are stored in BSON format using GridFS. If so, you can simply use the mongofiles command to retrieve the PDF into a local file and index that in Solr either using Solr Cell or Tika.
See: http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb https://docs.mongodb.org/manual/reference/program/mongofiles/ -- Jack Krupansky On Fri, Feb 5, 2016 at 3:13 PM, Arnett, Gabriel <gabe.arn...@moodys.com> wrote: > Anyone have any experience indexing pdfs stored in binary form in mongodb? > > ................................................. > Gabe Arnett > Senior Director > Moody's Analytics > > ----------------------------------------- > > The information contained in this e-mail message, and any attachment > thereto, is confidential and may not be disclosed without our express > permission. If you are not the intended recipient or an employee or agent > responsible for delivering this message to the intended recipient, you are > hereby notified that you have received this message in error and that any > review, dissemination, distribution or copying of this message, or any > attachment thereto, in whole or in part, is strictly prohibited. If you > have received this message in error, please immediately notify us by > telephone, fax or e-mail and delete the message and all of its attachments. > Thank you. Every effort is made to keep our network free from viruses. You > should, however, review this e-mail message, as well as any attachment > thereto, for viruses. We take no responsibility and have no liability for > any computer virus which may be transferred via this e-mail message. >