PDF files can be directly imported into Solr using Solr Cell (AKA ExtractingRequestHandler).

See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

Internally, Solr Cell uses Tika, which in turn uses PDFBox.

-- Jack Krupansky

-----Original Message----- From: Alexei Martchenko
Sent: Monday, February 3, 2014 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Apache Solr.

That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html


alexei martchenko
Facebook <http://www.facebook.com/alexeiramone> |
Linkedin<http://br.linkedin.com/in/alexeimartchenko>|
Steam <http://steamcommunity.com/id/alexeiramone/> |
4sq<https://pt.foursquare.com/alexeiramone>| Skype: alexeiramone |
Github <https://github.com/alexeiramone> | (11) 9 7613.0966 |


2014-02-03 Siegfried Goeschl <sgoes...@gmx.at>:

Hi Vignesh,

a few keywords for further investigations

* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox

Cheers,

Siegfried Goeschl


On 03.02.14 09:15, vignesh wrote:

Hi Team,



                     I am Vignesh, am using Apache Solr 3.6 and able to
Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful. Kindly
guide me through this process.





Thanks & Regards.

Vignesh.V



cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

  <blocked::http://www.ninestars.in/> www.ninestars.in




--

30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--




Reply via email to