Re: Apache Solr.

Jack Krupansky Mon, 03 Feb 2014 05:24:09 -0800

PDF files can be directly imported into Solr using Solr Cell (AKAExtractingRequestHandler).


See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika


Internally, Solr Cell uses Tika, which in turn uses PDFBox.

-- Jack Krupansky

-----Original Message-----From: Alexei Martchenko

Sent: Monday, February 3, 2014 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Apache Solr.

That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to
use Tikka to import binary/specific file types.

http://tika.apache.org/1.4/formats.html

alexei martchenko
Facebook <http://www.facebook.com/alexeiramone> |
Linkedin<http://br.linkedin.com/in/alexeimartchenko>|
Steam <http://steamcommunity.com/id/alexeiramone/> |
4sq<https://pt.foursquare.com/alexeiramone>| Skype: alexeiramone |
Github <https://github.com/alexeiramone> | (11) 9 7613.0966 |

2014-02-03 Siegfried Goeschl <sgoes...@gmx.at>:

Hi Vignesh,

a few keywords for further investigations

* Solr Data Import Handler
* Apache Tikka
* Apache PDFBox

Cheers,

Siegfried Goeschl


On 03.02.14 09:15, vignesh wrote:

Hi Team,



                     I am Vignesh, am using Apache Solr 3.6 and able to
Index
XML file and now trying to Index PDF file and not able to index .Can you

give me the steps to carry out PDF indexing it will be very useful.Kindly

guide me through this process.





Thanks & Regards.

Vignesh.V



cid:image001.jpg@01CA4872.39B33D40

Ninestars Information Technologies Limited.,

72, Greams Road, Thousand Lights, Chennai - 600 006. India.

Landline : +91 44 2829 4226 / 36 / 56   X: 144

  <blocked::http://www.ninestars.in/> www.ninestars.in




--

30 Million Advertisements displayed. Is yours there?
http://www.safentrixads.com/adlink?cid=13
--

Re: Apache Solr.

Reply via email to