See if they are stored in BSON format using GridFS. If so, you can simply
use the mongofiles command to retrieve the PDF into a local file and index
that in Solr either using Solr Cell or Tika.

See:
http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb
https://docs.mongodb.org/manual/reference/program/mongofiles/


-- Jack Krupansky

On Fri, Feb 5, 2016 at 3:13 PM, Arnett, Gabriel <gabe.arn...@moodys.com>
wrote:

> Anyone have any experience indexing pdfs stored in binary form in mongodb?
>
> .................................................
> Gabe Arnett
> Senior Director
> Moody's Analytics
>
> -----------------------------------------
>
> The information contained in this e-mail message, and any attachment
> thereto, is confidential and may not be disclosed without our express
> permission. If you are not the intended recipient or an employee or agent
> responsible for delivering this message to the intended recipient, you are
> hereby notified that you have received this message in error and that any
> review, dissemination, distribution or copying of this message, or any
> attachment thereto, in whole or in part, is strictly prohibited. If you
> have received this message in error, please immediately notify us by
> telephone, fax or e-mail and delete the message and all of its attachments.
> Thank you. Every effort is made to keep our network free from viruses. You
> should, however, review this e-mail message, as well as any attachment
> thereto, for viruses. We take no responsibility and have no liability for
> any computer virus which may be transferred via this e-mail message.
>

Reply via email to