PDF files can be directly imported into Solr using Solr Cell (AKA
ExtractingRequestHandler).
See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
Internally, Solr Cell uses Tika, which in turn uses PDFBox.
-- Jack Krupansky
-----Original Message-----
From: Alexei Martchenko
Sent: Monday, February 3, 2014 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Apache Solr.
That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.
http://tika.apache.org/1.4/formats.html
alexei martchenko
Facebook <http://www.facebook.com/alexeiramone> |
Linkedin<http://br.linkedin.com/in/alexeimartchenko>|
Steam <http://steamcommunity.com/id/alexeiramone/> |
4sq<https://pt.foursquare.com/alexeiramone>| Skype: alexeiramone |
Github <https://github.com/alexeiramone> | (11) 9 7613.0966 |
2014-02-03 Siegfried Goeschl <sgoes...@gmx.at>:
Hi Vignesh,
a few keywords for further investigations
* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox
Cheers,
Siegfried Goeschl
On 03.02.14 09:15, vignesh wrote:
Hi Team,
I am Vignesh, am using Apache Solr 3.6 and able to
Index
XML file and now trying to Index PDF file and not able to index .Can you
give me the steps to carry out PDF indexing it will be very useful.
Kindly
guide me through this process.
Thanks & Regards.
Vignesh.V
cid:image001.jpg@01CA4872.39B33D40
Ninestars Information Technologies Limited.,
72, Greams Road, Thousand Lights, Chennai - 600 006. India.
Landline : +91 44 2829 4226 / 36 / 56 X: 144
<blocked::http://www.ninestars.in/> www.ninestars.in
--
30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--