Try ... curl " http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file= <Full_Path_of_File>/pub2009001.pdf&literal.id=777045&commit=true"
stream.file - specify full path literal.<extra params> - specify any extra params if needed Regards, Jayendra On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] < xiao...@mail.nlm.nih.gov> wrote: > Thanks so much for your help! I tried to index a pdf file and got the > following. The command I used is > > curl ' > http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@pub2009001.pdf" > > Did I do something wrong? Do I need modify anything in schema.xml or other > configuration file? > > ******************************************** > [xiao...@lhcinternal lhc]$ curl ' > http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@pub2009001.pdf" > <html> > <head> > <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> > <title>Error 404 </title> > </head> > <body><h2>HTTP ERROR: 404</h2><pre>NOT_FOUND</pre> > <p>RequestURI=/solr/lhc/update/extract</p><p><i><small><a href=" > http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > <br/> > > </body> > </html> > ******************************************* > > -----Original Message----- > From: Sharp, Jonathan [mailto:jsh...@coh.org] > Sent: Tuesday, August 10, 2010 4:37 PM > To: solr-user@lucene.apache.org > Subject: RE: PDF file > > Xiaohui, > > You need to add the following jars to the lib subdirectory of the solr > config directory on your server. > > (path inside the solr 1.4.1 download) > > /dist/apache-solr-cell-1.4.1.jar > plus all the jars in > /contrib/extraction/lib > > HTH > > -Jon > ________________________________________ > From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiao...@mail.nlm.nih.gov] > Sent: Tuesday, August 10, 2010 11:57 AM > To: 'solr-user@lucene.apache.org' > Subject: RE: PDF file > > Does anyone have any experience with PDF file? I really appreciate your > help! > Thanks so much in advance. > > -----Original Message----- > From: Ma, Xiaohui (NIH/NLM/LHC) [C] > Sent: Tuesday, August 10, 2010 10:37 AM > To: 'solr-user@lucene.apache.org' > Subject: PDF file > > I have a lot of pdf files. I am trying to import pdf files to solr and > index them. I added ExtractingRequestHandler to solrconfig.xml. > > Please tell me if I need download some jar files. > > In the Solr1.4 Enterprise Search Server book, use following command to > import a mccm.pdf. > > curl ' > http://localhost:8983/solr/solr-home/update/extract?map.content=text&map.stream_name=id&commit=true' > -F "fi...@mccm.pdf" > > Please tell me if there is a way to import pdf files from a directory. > > Thanks so much for your help! > > > > --------------------------------------------------------------------- > SECURITY/CONFIDENTIALITY WARNING: > This message and any attachments are intended solely for the individual or > entity to which they are addressed. This communication may contain > information that is privileged, confidential, or exempt from disclosure > under applicable law (e.g., personal health information, research data, > financial information). Because this e-mail has been sent without > encryption, individuals other than the intended recipient may be able to > view the information, forward it to others or tamper with the information > without the knowledge or consent of the sender. If you are not the intended > recipient, or the employee or person responsible for delivering the message > to the intended recipient, any dissemination, distribution or copying of the > communication is strictly prohibited. If you received the communication in > error, please notify the sender immediately by replying to this message and > deleting the message and any accompanying files from your system. If, due to > the security risks, you do not wish to receive further communications via > e-mail, please reply to this message and inform the sender that you do not > wish to receive further e-mail from the sender. > > --------------------------------------------------------------------- > >