Hi Walled, I've never done that with solr, but you would probably need to use some OCR preprocessing before indexing. The most popular library I know for the job is tesseract-orc <https://github.com/tesseract-ocr>.
If you want to do that inside solr I've found that Tika has some support for that too. Take a look Vijay Mhaskar's post on how to do this using TikaOCR http://blog.thedigitalgroup.com/vijaym/using-solr-and-tikaocr-to-search-text-inside-an-image/ I hope that guides you Em dom, 26 de mar de 2017 às 16:09, Waleed Raza < waleed.raza.parhi...@gmail.com> escreveu: > Hello > I want to ask you that how can we extract text in solr from images which > are inside pdf and MS office documents ? > i found many websites but did not get a reply of it please guide me. > > On Sun, Mar 26, 2017 at 2:57 PM, Waleed Raza < > waleed.raza.parhi...@gmail.com > > wrote: > > > Hello > > I want to ask you that how can we extract in solr text from images which > > are inside pdf and MS office documents ? > > i found many websites but did not get a reply of it please guide me. > > > > > -- [image: INESC TEC] *Arian Rodrigo Pasquali* Laboratório de Inteligência Artificial e Apoio à Decisão Laboratory of Artificial Intelligence and Decision Support *INESC TEC* Campus da FEUP Rua Dr Roberto Frias 4200-465 Porto Portugal T +351 22 040 2963 F +351 22 209 4050 arian.r.pasqu...@inesctec.pt www.inesctec.pt