I tried this solution from Tim Allison, and it works. http://stackoverflow.com/questions/32354209/apache-tika-extract-scanned-pdf-files
Regards, Edwin On 27 March 2017 at 20:07, Allison, Timothy B. <talli...@mitre.org> wrote: > Please also see: > > https://wiki.apache.org/tika/TikaOCR > > and > > https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29#OCR > > If you have any other questions about Apache Tika and OCR, please feel > free to ask on our users list as well: u...@tika.apache.org > > Cheers, > > Tim > > -----Original Message----- > From: Arian Pasquali [mailto:arianpasqu...@gmail.com] > Sent: Sunday, March 26, 2017 11:44 AM > To: solr-user@lucene.apache.org > Subject: Re: Index scanned documents > > Hi Walled, > > I've never done that with solr, but you would probably need to use some > OCR preprocessing before indexing. > The most popular library I know for the job is tesseract-orc < > https://github.com/tesseract-ocr>. > > If you want to do that inside solr I've found that Tika has some support > for that too. > Take a look Vijay Mhaskar's post on how to do this using TikaOCR > > http://blog.thedigitalgroup.com/vijaym/using-solr-and- > tikaocr-to-search-text-inside-an-image/ > > I hope that guides you > > Em dom, 26 de mar de 2017 às 16:09, Waleed Raza < > waleed.raza.parhi...@gmail.com> escreveu: > > > Hello > > I want to ask you that how can we extract text in solr from images > > which are inside pdf and MS office documents ? > > i found many websites but did not get a reply of it please guide me. > > > > On Sun, Mar 26, 2017 at 2:57 PM, Waleed Raza < > > waleed.raza.parhi...@gmail.com > > > wrote: > > > > > Hello > > > I want to ask you that how can we extract in solr text from images > > > which are inside pdf and MS office documents ? > > > i found many websites but did not get a reply of it please guide me. > > > > > > > > > -- > [image: INESC TEC] > > *Arian Rodrigo Pasquali* > Laboratório de Inteligência Artificial e Apoio à Decisão Laboratory of > Artificial Intelligence and Decision Support > > *INESC TEC* > Campus da FEUP > Rua Dr Roberto Frias > 4200-465 Porto > Portugal > > T +351 22 040 2963 > F +351 22 209 4050 > arian.r.pasqu...@inesctec.pt > www.inesctec.pt >